ChatGPT ( OpenAI )

Traffic

OpenAI has three distinct bots-

Bot
UserAgent
Tracking

GPTBot Used for crawling content that may be used in training OpenAI's generative AI foundation models.

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot

Current CIDRs- https://openai.com/gptbot.json User agent may be a suitable backup, but is easily spoofed.

OAI-SearchBot Used for search functionality in ChatGPT's search features. It is not used to crawl content for training AI models.

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot

Current CIDRs- https://openai.com/searchbot.json User agent may be a suitable backup, but is easily spoofed.

ChatGPT-User Used when users ask ChatGPT or a Custom GPT to visit a web page. It's not used for automatic crawling or AI training.

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot

Current CIDRs- https://openai.com/chatgpt-user.json User agent may be a suitable backup, but is easily spoofed.

It also has direct user traffic when a user clicks a link from ChatGPT.

?utm_source=chatgpt.com

Tracking

Server-side

  1. Check signature to verify

Client-side

  1. Prefer CIDRs ( keep list updated )

OpenAI publishes a list of ChatGPT User CIDRs here-

https://openai.com/chatgpt-user.json

These indicate the IP ranges

http://openai.com/searchbot.json

https://openai.com/gptbot.json

What the “ChatGPT-User” CIDRs represent

  • They are egress IPs used by ChatGPT when it makes outbound HTTP(S) requests to the public internet on a user’s behalf (e.g., the ChatGPT agent or a GPT fetching a page/API). (OpenAI Help Center)

  • Triggers include:

    • Agent/automation visits to websites while completing tasks. (OpenAI Help Center)

    • GPT Actions calling third-party APIs from ChatGPT. (OpenAI Help Center)

    • Link retrieval/visits generated in ChatGPT (when ChatGPT provides links or fetches online sources for a reply). (OpenAI Help Center)

What they are not

  • They are not the training crawler (“GPTBot”) IPs. GPTBot is a separate crawler used for web data collection and is documented independently. (OpenAI)

How to positively identify ChatGPT traffic (recommended over IP matching alone)

  • Validate the HTTP Message Signatures ChatGPT adds to outbound requests (Signature, Signature-Input, and Signature-Agent: "https://chatgpt.com"). This cryptographically proves a request came from ChatGPT. (OpenAI Help Center)

  • Major CDNs also expose verified detections (e.g., Vercel Verified Bots and Cloudflare’s bot directory entry for ChatGPT agent), which you can allowlist. (Vercel)

Heads-up on the “ChatGPT-User” label

  • Many site owners also see the “ChatGPT-User” user-agent token associated with these requests; it’s widely documented by third-party bot directories. Use signatures for assurance, since UAs can be spoofed. (Dark Visitors)

Reference to the CIDR list itself

  • OpenAI publishes the “ChatGPT-User” IP ranges as a JSON feed (/chatgpt-user.json). Use that feed for the current list, but rely on signature verification when possible. (community.openai.com)

Future

PostHog transforms (Hog) run on event payloads, not raw HTTP requests, and they don’t have access to inbound request headers (e.g. Signature, Signature-Input). They also can’t make outbound fetches to validate signatures. Therefore you can’t verify HTTP Message Signatures inside a PostHog transform.

Do signature verification at your edge/app (e.g., Cloudflare Worker, server, CDN function). When a request is verified as ChatGPT, attach a flag into the PostHog event you emit (or set a cookie/param your client capture reads). Then let Hog classify it first.

Minimal Hog change (add this to the top of your classifier):

// 0) Prefer cryptographic proof set upstream
if (event.properties['chatgpt_signature_verified'] = true) {
    event.properties['ai_traffic'] := true
    event.properties['ai_traffic_platform'] := 'chatgpt'
    event.properties['ai_traffic_type'] := 'chatgpt_user'
    event.properties['ai_traffic_identifier'] := 'signature'
    return event
}

Pipeline summary:

  1. Edge/server: verify ChatGPT HTTP message signature on the incoming request; if valid, include chatgpt_signature_verified=true when you send/capture the PostHog event (or expose it to the client so the JS capture can attach it).

  2. Hog: fall back to your other signals:

    • CIDR match → ai_traffic_type='chatgpt_user', ai_traffic_identifier='ip'

    • GPTBot UA/IP → ai_traffic_type='gptbot', identifier ua or ip

    • Referrer/UA (“chatgpt.com”, “chat.openai.com”, “ChatGPT”) → ai_traffic_type='chatgpt_click', identifier referrer or ua

This yields cryptographic certainty when available, with IP/UA/referrer as explicit fallbacks.

Last updated