# ChatGPT ( OpenAI )

## Traffic&#x20;

OpenAI has three distinct bots-&#x20;

<table><thead><tr><th width="240.33331298828125">Bot</th><th>UserAgent</th><th>Tracking</th></tr></thead><tbody><tr><td><strong>GPTBot</strong><br>Used for crawling content that may be used in training OpenAI's generative AI foundation models.</td><td>Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot</td><td>Current CIDRs-<br><a href="https://openai.com/gptbot.json">https://openai.com/gptbot.json</a> <br><br>User agent may be a suitable backup, but is easily spoofed.</td></tr><tr><td><strong>OAI-SearchBot</strong><br>Used for search functionality in ChatGPT's search features. It is not used to crawl content for training AI models. </td><td>Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot </td><td>Current CIDRs-<br><a href="https://openai.com/searchbot.json">https://openai.com/searchbot.json</a><br><br>User agent may be a suitable backup, but is easily spoofed.</td></tr><tr><td><strong>ChatGPT-User</strong><br>Used when users ask ChatGPT or a Custom GPT to visit a web page. It's not used for automatic crawling or AI training.</td><td>Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot</td><td>Current CIDRs-<br><a href="https://openai.com/chatgpt-user.json">https://openai.com/chatgpt-user.json</a><br><br>User agent may be a suitable backup, but is easily spoofed.</td></tr></tbody></table>

It also has direct user traffic when a user clicks a link from ChatGPT.&#x20;

```
?utm_source=chatgpt.com
```

## Tracking&#x20;

Server-side&#x20;

1. Check signature to verify&#x20;

Client-side&#x20;

1. Prefer CIDRs ( keep list updated )&#x20;
2.

OpenAI publishes a list of ChatGPT User CIDRs here-&#x20;

<https://openai.com/chatgpt-user.json>

These indicate the IP ranges&#x20;

<http://openai.com/searchbot.json>

<https://openai.com/gptbot.json>

**What the “ChatGPT-User” CIDRs represent**

* They are **egress IPs used by ChatGPT** when it makes **outbound HTTP(S) requests to the public internet on a user’s behalf** (e.g., the ChatGPT agent or a GPT fetching a page/API). ([OpenAI Help Center](https://help.openai.com/en/articles/11845367-chatgpt-agent-allowlisting))
* Triggers include:
  * **Agent/automation visits to websites** while completing tasks. ([OpenAI Help Center](https://help.openai.com/en/articles/11845367-chatgpt-agent-allowlisting))
  * **GPT Actions** calling **third-party APIs** from ChatGPT. ([OpenAI Help Center](https://help.openai.com/en/articles/9442513-gpt-actions-domain-settings-chatgpt-enterprise))
  * **Link retrieval/visits generated in ChatGPT** (when ChatGPT provides links or fetches online sources for a reply). ([OpenAI Help Center](https://help.openai.com/en/articles/10984597-chatgpt-generated-links))

**What they are not**

* They are **not the training crawler** (“GPTBot”) IPs. GPTBot is a separate crawler used for web data collection and is documented independently. ([OpenAI](https://platform.openai.com/docs/gptbot?utm_source=chatgpt.com))

**How to positively identify ChatGPT traffic (recommended over IP matching alone)**

* Validate the **HTTP Message Signatures** ChatGPT adds to outbound requests (`Signature`, `Signature-Input`, and `Signature-Agent: "https://chatgpt.com"`). This cryptographically proves a request came from ChatGPT. ([OpenAI Help Center](https://help.openai.com/en/articles/11845367-chatgpt-agent-allowlisting))
* Major CDNs also expose verified detections (e.g., **Vercel Verified Bots** and Cloudflare’s bot directory entry for ChatGPT agent), which you can allowlist. ([Vercel](https://vercel.com/docs/botid/verified-bots?utm_source=chatgpt.com))

**Heads-up on the “ChatGPT-User” label**

* Many site owners also see the **“ChatGPT-User” user-agent token** associated with these requests; it’s widely documented by third-party bot directories. Use signatures for assurance, since UAs can be spoofed. ([Dark Visitors](https://darkvisitors.com/agents/chatgpt-user?utm_source=chatgpt.com))

**Reference to the CIDR list itself**

* OpenAI publishes the “ChatGPT-User” IP ranges as a JSON feed (`/chatgpt-user.json`). Use that feed for the current list, but rely on signature verification when possible. ([community.openai.com](https://community.openai.com/t/ip-range-for-bot-detection-allow-list/1287217?utm_source=chatgpt.com))

## Future

{% hint style="warning" %}
Consider using Hyperflow LLMS to receive and verify inbound request headers, and pass that information through to Posthog, rather than the injected script approach.&#x20;
{% endhint %}

PostHog transforms (Hog) run on **event payloads**, not raw HTTP requests, and they **don’t have access to inbound request headers** (e.g. `Signature`, `Signature-Input`). They also can’t make outbound fetches to validate signatures. Therefore you **can’t verify HTTP Message Signatures inside a PostHog transform**.

Do signature verification at your edge/app (e.g., Cloudflare Worker, server, CDN function). When a request is verified as ChatGPT, attach a flag into the PostHog event you emit (or set a cookie/param your client capture reads). Then let Hog classify it first.

Minimal Hog change (add this to the top of your classifier):

```hog
// 0) Prefer cryptographic proof set upstream
if (event.properties['chatgpt_signature_verified'] = true) {
    event.properties['ai_traffic'] := true
    event.properties['ai_traffic_platform'] := 'chatgpt'
    event.properties['ai_traffic_type'] := 'chatgpt_user'
    event.properties['ai_traffic_identifier'] := 'signature'
    return event
}
```

Pipeline summary:

1. **Edge/server**: verify ChatGPT HTTP message signature on the incoming request; if valid, include `chatgpt_signature_verified=true` when you send/capture the PostHog event (or expose it to the client so the JS capture can attach it).
2. **Hog**: fall back to your other signals:
   * CIDR match → `ai_traffic_type='chatgpt_user'`, `ai_traffic_identifier='ip'`
   * GPTBot UA/IP → `ai_traffic_type='gptbot'`, identifier `ua` or `ip`
   * Referrer/UA (“chatgpt.com”, “chat.openai.com”, “ChatGPT”) → `ai_traffic_type='chatgpt_click'`, identifier `referrer` or `ua`

This yields cryptographic certainty when available, with IP/UA/referrer as explicit fallbacks.
