Where should AI crawler rules live in robots.txt?

Place each bot-specific User-agent block before the generic * block. Robots.txt parsers match the most specific agent first. End every block with a Disallow line, even an empty one, to avoid unintended blanket blocks.

AI Platforms

AI Crawler Robots.txt Guide: Every Bot You Need to Allow

OAI-SearchBot, PerplexityBot, Claude-SearchBot, Google-Extended — the complete guide to configuring robots.txt for AI search visibility.

8 min read·Updated 2025-06-22

Robots.txt is the first gate between your content and AI search engines. If the crawlers that feed ChatGPT Search, Perplexity, Google AI Overviews, and Claude cannot fetch your pages, no amount of structured data or factual density will earn you a citation.

AI search engines use dedicated crawlers separate from their training crawlers. Blocking training crawlers (GPTBot, ClaudeBot) is reasonable. Blocking search crawlers (OAI-SearchBot, Claude-SearchBot) is a self-inflicted GEO wound. This guide lists every AI crawler you need to allow, what each one does, and the exact robots.txt block to deploy.

Why this matters: AI search referral traffic grew 527% year-over-year in 2025 (Previsible). Google AI Overviews now appears on 16% of queries, up from 6.49% earlier in the year. Sites that explicitly block AI search crawlers are invisible in this channel regardless of content quality.

The 8 AI crawlers you need to know

Each AI platform runs at least one crawler. Some run two: a training crawler that builds foundation models, and a search crawler that powers real-time citations. The distinction matters — many sites block the training crawler and accidentally block the search crawler too.

User-agent	Purpose	Powers
OAI-SearchBot	Search retrieval	ChatGPT Search citations
GPTBot	Model training	OpenAI model improvements
PerplexityBot	Search retrieval + indexing	Perplexity numbered citations
Claude-SearchBot	Search retrieval	Claude citations in search mode
Claude-User	On-demand fetch for user prompts	Claude real-time answers
ClaudeBot	Model training	Anthropic model improvements
Google-Extended	Training opt-in/opt-out	Gemini and Vertex AI training
Applebot-Extended	Training opt-in/opt-out	Apple Intelligence, Apple GPT

Source: OpenAI platform documentation, Anthropic crawler docs, Google Search Central documentation, Perplexity documentation (2025).

"Google AI Overviews does not use a separate crawler. It reuses the regular Googlebot index. So if your site is in Google's index, you are already eligible for AI Overviews citations — provided your content matches the query."
— Google Search Central documentation, 2025

The minimum-viable robots.txt for GEO

The block below explicitly allows every AI search crawler while leaving training crawlers neutral (allowed by default unless you disallow them). Paste it at the top of your robots.txt, above any User-agent: * block.

# ── AI search crawlers: explicitly allow ──
User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

# ── AI training crawlers: optional, allow or block ──
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

# ── Default rules ──
User-agent: *
Disallow: /private/
Disallow: /admin/
Allow: /

Sitemap: https://example.com/sitemap.xml

Why "block everything by default" backfires

A common SEO-school recommendation is to start robots.txt with User-agent: * / Disallow: / then selectively allow bots. This works for traditional search but breaks GEO because most AI search crawlers fall back to the * rule when their specific block is missing or malformed.

A 2025 audit of 300,000 domains by SERanking found that fewer than 4% of sites had a correct OAI-SearchBot entry. The rest either omitted it (relying on default behavior) or wrote a rule that the parser ignored. Result: their content never reached ChatGPT Search's retrieval pool.

Three rules for writing crawler blocks

1.
Place specific agents before the wildcard.
Robots.txt parsers match top-down. If User-agent: * comes first with a Disallow, the AI bot may inherit that block before reaching its own entry.
2.
End every block with a Disallow line.
An empty Disallow: means "allow everything." Omitting the line entirely is undefined behavior in some parsers. Always write it explicitly.
3.
Validate before deploying.
Run your robots.txt through Google Search Console's robots.txt Tester and a third-party validator. A single misplaced character can silently block an entire crawler.

Verifying crawler access

After deploying, verify that each AI search crawler can actually fetch your pages. Three checks cover 95% of issues:

▸ Server logs — grep your access logs for the user-agent strings above. If you see fetches, the bot reached you. If you don't after 30 days, your block is likely still active.
▸ robots.txt Tester — Google Search Console lets you test any user-agent against your live robots.txt. Try "OAI-SearchBot" and "PerplexityBot" explicitly.
▸ Direct citation check — ask ChatGPT Search and Perplexity a question your site should answer. If neither cites you after 4 weeks of correct robots.txt, the problem is content, not access.

When to block training crawlers

Blocking GPTBot and ClaudeBot (training crawlers) does not affect search citations. Sites that opt out of training usually do so for three reasons: copyrighted content, paywalled content, or competitive IP. If your business model depends on any of these, blocking training is defensible. Just be sure to keep the search crawlers (OAI-SearchBot, Claude-SearchBot, Claude-User) allowed.

The Princeton GEO study measured a 30–40% visibility lift for pages with citations and statistics. None of that lift is reachable if the AI search crawler cannot fetch the page in the first place. Robots.txt is the prerequisite, not the strategy.

Frequently asked questions

Which AI crawler user-agents should I allow in robots.txt?

At minimum allow OAI-SearchBot (ChatGPT Search), PerplexityBot, Claude-SearchBot and Claude-User, Google-Extended (Gemini), and Applebot-Extended (Apple Intelligence). GPTBot and ClaudeBot are training crawlers and optional if you only want search citation.

Does Google AI Overviews use a separate crawler?

No. Google AI Overviews reuses the regular Googlebot index, so you do not need a new user-agent entry. Google-Extended is only used to opt Gemini training data in or out.

What is the difference between GPTBot and OAI-SearchBot?

GPTBot crawls pages for OpenAI model training. OAI-SearchBot crawls pages specifically to power ChatGPT Search inline citations. If your goal is GEO visibility, OAI-SearchBot is the one that matters.

Can I block GPTBot but still get cited in ChatGPT Search?

Yes. Blocking GPTBot only stops your content being used for training. As long as OAI-SearchBot is allowed, ChatGPT Search can still retrieve and cite your pages.

References: OpenAI platform docs — GPTBot & OAI-SearchBot (2025). · Anthropic documentation — ClaudeBot, Claude-SearchBot, Claude-User (2025). · Google Search Central — Google-Extended (2025). · Perplexity documentation — PerplexityBot (2025). · Previsible 2025 AI Search Traffic Report. · SERanking robots.txt audit of 300,000 domains (2025). · Aggarwal et al., "GEO: Generative Engine Optimization," arXiv:2311.09735, KDD 2024.

Want to check your site's GEO readiness?

Run the 27-point GEO audit