ChatGPT ( OpenAI )
Traffic
OpenAI has three distinct bots-
GPTBot Used for crawling content that may be used in training OpenAI's generative AI foundation models.
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot
Current CIDRs- https://openai.com/gptbot.json User agent may be a suitable backup, but is easily spoofed.
OAI-SearchBot Used for search functionality in ChatGPT's search features. It is not used to crawl content for training AI models.
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot
Current CIDRs- https://openai.com/searchbot.json User agent may be a suitable backup, but is easily spoofed.
ChatGPT-User Used when users ask ChatGPT or a Custom GPT to visit a web page. It's not used for automatic crawling or AI training.
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
Current CIDRs- https://openai.com/chatgpt-user.json User agent may be a suitable backup, but is easily spoofed.
It also has direct user traffic when a user clicks a link from ChatGPT.
?utm_source=chatgpt.com
Tracking
Server-side
Check signature to verify
Client-side
Prefer CIDRs ( keep list updated )
OpenAI publishes a list of ChatGPT User CIDRs here-
https://openai.com/chatgpt-user.json
These indicate the IP ranges
http://openai.com/searchbot.json
https://openai.com/gptbot.json
What the “ChatGPT-User” CIDRs represent
They are egress IPs used by ChatGPT when it makes outbound HTTP(S) requests to the public internet on a user’s behalf (e.g., the ChatGPT agent or a GPT fetching a page/API). (OpenAI Help Center)
Triggers include:
Agent/automation visits to websites while completing tasks. (OpenAI Help Center)
GPT Actions calling third-party APIs from ChatGPT. (OpenAI Help Center)
Link retrieval/visits generated in ChatGPT (when ChatGPT provides links or fetches online sources for a reply). (OpenAI Help Center)
What they are not
They are not the training crawler (“GPTBot”) IPs. GPTBot is a separate crawler used for web data collection and is documented independently. (OpenAI)
How to positively identify ChatGPT traffic (recommended over IP matching alone)
Validate the HTTP Message Signatures ChatGPT adds to outbound requests (
Signature
,Signature-Input
, andSignature-Agent: "https://chatgpt.com"
). This cryptographically proves a request came from ChatGPT. (OpenAI Help Center)Major CDNs also expose verified detections (e.g., Vercel Verified Bots and Cloudflare’s bot directory entry for ChatGPT agent), which you can allowlist. (Vercel)
Heads-up on the “ChatGPT-User” label
Many site owners also see the “ChatGPT-User” user-agent token associated with these requests; it’s widely documented by third-party bot directories. Use signatures for assurance, since UAs can be spoofed. (Dark Visitors)
Reference to the CIDR list itself
OpenAI publishes the “ChatGPT-User” IP ranges as a JSON feed (
/chatgpt-user.json
). Use that feed for the current list, but rely on signature verification when possible. (community.openai.com)
Future
Consider using Hyperflow LLMS to receive and verify inbound request headers, and pass that information through to Posthog, rather than the injected script approach.
PostHog transforms (Hog) run on event payloads, not raw HTTP requests, and they don’t have access to inbound request headers (e.g. Signature
, Signature-Input
). They also can’t make outbound fetches to validate signatures. Therefore you can’t verify HTTP Message Signatures inside a PostHog transform.
Do signature verification at your edge/app (e.g., Cloudflare Worker, server, CDN function). When a request is verified as ChatGPT, attach a flag into the PostHog event you emit (or set a cookie/param your client capture reads). Then let Hog classify it first.
Minimal Hog change (add this to the top of your classifier):
// 0) Prefer cryptographic proof set upstream
if (event.properties['chatgpt_signature_verified'] = true) {
event.properties['ai_traffic'] := true
event.properties['ai_traffic_platform'] := 'chatgpt'
event.properties['ai_traffic_type'] := 'chatgpt_user'
event.properties['ai_traffic_identifier'] := 'signature'
return event
}
Pipeline summary:
Edge/server: verify ChatGPT HTTP message signature on the incoming request; if valid, include
chatgpt_signature_verified=true
when you send/capture the PostHog event (or expose it to the client so the JS capture can attach it).Hog: fall back to your other signals:
CIDR match →
ai_traffic_type='chatgpt_user'
,ai_traffic_identifier='ip'
GPTBot UA/IP →
ai_traffic_type='gptbot'
, identifierua
orip
Referrer/UA (“chatgpt.com”, “chat.openai.com”, “ChatGPT”) →
ai_traffic_type='chatgpt_click'
, identifierreferrer
orua
This yields cryptographic certainty when available, with IP/UA/referrer as explicit fallbacks.
Last updated