# ChatGPT ( OpenAI )

## Traffic&#x20;

OpenAI has three distinct bots-&#x20;

<table><thead><tr><th width="240.33331298828125">Bot</th><th>UserAgent</th><th>Tracking</th></tr></thead><tbody><tr><td><strong>GPTBot</strong><br>Used for crawling content that may be used in training OpenAI's generative AI foundation models.</td><td>Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot</td><td>Current CIDRs-<br><a href="https://openai.com/gptbot.json">https://openai.com/gptbot.json</a> <br><br>User agent may be a suitable backup, but is easily spoofed.</td></tr><tr><td><strong>OAI-SearchBot</strong><br>Used for search functionality in ChatGPT's search features. It is not used to crawl content for training AI models. </td><td>Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot </td><td>Current CIDRs-<br><a href="https://openai.com/searchbot.json">https://openai.com/searchbot.json</a><br><br>User agent may be a suitable backup, but is easily spoofed.</td></tr><tr><td><strong>ChatGPT-User</strong><br>Used when users ask ChatGPT or a Custom GPT to visit a web page. It's not used for automatic crawling or AI training.</td><td>Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot</td><td>Current CIDRs-<br><a href="https://openai.com/chatgpt-user.json">https://openai.com/chatgpt-user.json</a><br><br>User agent may be a suitable backup, but is easily spoofed.</td></tr></tbody></table>

It also has direct user traffic when a user clicks a link from ChatGPT.&#x20;

```
?utm_source=chatgpt.com
```

## Tracking&#x20;

Server-side&#x20;

1. Check signature to verify&#x20;

Client-side&#x20;

1. Prefer CIDRs ( keep list updated )&#x20;
2.

OpenAI publishes a list of ChatGPT User CIDRs here-&#x20;

<https://openai.com/chatgpt-user.json>

These indicate the IP ranges&#x20;

<http://openai.com/searchbot.json>

<https://openai.com/gptbot.json>

**What the “ChatGPT-User” CIDRs represent**

* They are **egress IPs used by ChatGPT** when it makes **outbound HTTP(S) requests to the public internet on a user’s behalf** (e.g., the ChatGPT agent or a GPT fetching a page/API). ([OpenAI Help Center](https://help.openai.com/en/articles/11845367-chatgpt-agent-allowlisting))
* Triggers include:
  * **Agent/automation visits to websites** while completing tasks. ([OpenAI Help Center](https://help.openai.com/en/articles/11845367-chatgpt-agent-allowlisting))
  * **GPT Actions** calling **third-party APIs** from ChatGPT. ([OpenAI Help Center](https://help.openai.com/en/articles/9442513-gpt-actions-domain-settings-chatgpt-enterprise))
  * **Link retrieval/visits generated in ChatGPT** (when ChatGPT provides links or fetches online sources for a reply). ([OpenAI Help Center](https://help.openai.com/en/articles/10984597-chatgpt-generated-links))

**What they are not**

* They are **not the training crawler** (“GPTBot”) IPs. GPTBot is a separate crawler used for web data collection and is documented independently. ([OpenAI](https://platform.openai.com/docs/gptbot?utm_source=chatgpt.com))

**How to positively identify ChatGPT traffic (recommended over IP matching alone)**

* Validate the **HTTP Message Signatures** ChatGPT adds to outbound requests (`Signature`, `Signature-Input`, and `Signature-Agent: "https://chatgpt.com"`). This cryptographically proves a request came from ChatGPT. ([OpenAI Help Center](https://help.openai.com/en/articles/11845367-chatgpt-agent-allowlisting))
* Major CDNs also expose verified detections (e.g., **Vercel Verified Bots** and Cloudflare’s bot directory entry for ChatGPT agent), which you can allowlist. ([Vercel](https://vercel.com/docs/botid/verified-bots?utm_source=chatgpt.com))

**Heads-up on the “ChatGPT-User” label**

* Many site owners also see the **“ChatGPT-User” user-agent token** associated with these requests; it’s widely documented by third-party bot directories. Use signatures for assurance, since UAs can be spoofed. ([Dark Visitors](https://darkvisitors.com/agents/chatgpt-user?utm_source=chatgpt.com))

**Reference to the CIDR list itself**

* OpenAI publishes the “ChatGPT-User” IP ranges as a JSON feed (`/chatgpt-user.json`). Use that feed for the current list, but rely on signature verification when possible. ([community.openai.com](https://community.openai.com/t/ip-range-for-bot-detection-allow-list/1287217?utm_source=chatgpt.com))

## Future

{% hint style="warning" %}
Consider using Hyperflow LLMS to receive and verify inbound request headers, and pass that information through to Posthog, rather than the injected script approach.&#x20;
{% endhint %}

PostHog transforms (Hog) run on **event payloads**, not raw HTTP requests, and they **don’t have access to inbound request headers** (e.g. `Signature`, `Signature-Input`). They also can’t make outbound fetches to validate signatures. Therefore you **can’t verify HTTP Message Signatures inside a PostHog transform**.

Do signature verification at your edge/app (e.g., Cloudflare Worker, server, CDN function). When a request is verified as ChatGPT, attach a flag into the PostHog event you emit (or set a cookie/param your client capture reads). Then let Hog classify it first.

Minimal Hog change (add this to the top of your classifier):

```hog
// 0) Prefer cryptographic proof set upstream
if (event.properties['chatgpt_signature_verified'] = true) {
    event.properties['ai_traffic'] := true
    event.properties['ai_traffic_platform'] := 'chatgpt'
    event.properties['ai_traffic_type'] := 'chatgpt_user'
    event.properties['ai_traffic_identifier'] := 'signature'
    return event
}
```

Pipeline summary:

1. **Edge/server**: verify ChatGPT HTTP message signature on the incoming request; if valid, include `chatgpt_signature_verified=true` when you send/capture the PostHog event (or expose it to the client so the JS capture can attach it).
2. **Hog**: fall back to your other signals:
   * CIDR match → `ai_traffic_type='chatgpt_user'`, `ai_traffic_identifier='ip'`
   * GPTBot UA/IP → `ai_traffic_type='gptbot'`, identifier `ua` or `ip`
   * Referrer/UA (“chatgpt.com”, “chat.openai.com”, “ChatGPT”) → `ai_traffic_type='chatgpt_click'`, identifier `referrer` or `ua`

This yields cryptographic certainty when available, with IP/UA/referrer as explicit fallbacks.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://hyperflow.sygnal.com/apps/hyperflow-llms/analytics/chatgpt-openai.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
