AI Crawler Robots.txt Guide: Every Bot You Need to Allow
OAI-SearchBot, PerplexityBot, Claude-SearchBot, Google-Extended — the complete guide to configuring robots.txt for AI search visibility.
Robots.txt is the first gate between your content and AI search engines. If the crawlers that feed ChatGPT Search, Perplexity, Google AI Overviews, and Claude cannot fetch your pages, no amount of structured data or factual density will earn you a citation.
AI search engines use dedicated crawlers separate from their training crawlers. Blocking training crawlers (GPTBot, ClaudeBot) is reasonable. Blocking search crawlers (OAI-SearchBot, Claude-SearchBot) is a self-inflicted GEO wound. This guide lists every AI crawler you need to allow, what each one does, and the exact robots.txt block to deploy.
Why this matters: AI search referral traffic grew 527% year-over-year in 2025 (Previsible). Google AI Overviews now appears on 16% of queries, up from 6.49% earlier in the year. Sites that explicitly block AI search crawlers are invisible in this channel regardless of content quality.
The 8 AI crawlers you need to know
Each AI platform runs at least one crawler. Some run two: a training crawler that builds foundation models, and a search crawler that powers real-time citations. The distinction matters — many sites block the training crawler and accidentally block the search crawler too.
| User-agent | Purpose | Powers |
|---|---|---|
| OAI-SearchBot | Search retrieval | ChatGPT Search citations |
| GPTBot | Model training | OpenAI model improvements |
| PerplexityBot | Search retrieval + indexing | Perplexity numbered citations |
| Claude-SearchBot | Search retrieval | Claude citations in search mode |
| Claude-User | On-demand fetch for user prompts | Claude real-time answers |
| ClaudeBot | Model training | Anthropic model improvements |
| Google-Extended | Training opt-in/opt-out | Gemini and Vertex AI training |
| Applebot-Extended | Training opt-in/opt-out | Apple Intelligence, Apple GPT |
Source: OpenAI platform documentation, Anthropic crawler docs, Google Search Central documentation, Perplexity documentation (2025).
"Google AI Overviews does not use a separate crawler. It reuses the regular Googlebot index. So if your site is in Google's index, you are already eligible for AI Overviews citations — provided your content matches the query."
The minimum-viable robots.txt for GEO
The block below explicitly allows every AI search crawler while leaving training crawlers neutral (allowed by default unless you disallow them). Paste it at the top of your robots.txt, above any User-agent: * block.
# ── AI search crawlers: explicitly allow ── User-agent: OAI-SearchBot Allow: / User-agent: PerplexityBot Allow: / User-agent: Claude-SearchBot Allow: / User-agent: Claude-User Allow: / User-agent: Google-Extended Allow: / User-agent: Applebot-Extended Allow: / # ── AI training crawlers: optional, allow or block ── User-agent: GPTBot Allow: / User-agent: ClaudeBot Allow: / # ── Default rules ── User-agent: * Disallow: /private/ Disallow: /admin/ Allow: / Sitemap: https://example.com/sitemap.xml
Why "block everything by default" backfires
A common SEO-school recommendation is to start robots.txt with User-agent: * / Disallow: / then selectively allow bots. This works for traditional search but breaks GEO because most AI search crawlers fall back to the * rule when their specific block is missing or malformed.
A 2025 audit of 300,000 domains by SERanking found that fewer than 4% of sites had a correct OAI-SearchBot entry. The rest either omitted it (relying on default behavior) or wrote a rule that the parser ignored. Result: their content never reached ChatGPT Search's retrieval pool.
Three rules for writing crawler blocks
- 1.Place specific agents before the wildcard.
Robots.txt parsers match top-down. If
User-agent: *comes first with a Disallow, the AI bot may inherit that block before reaching its own entry. - 2.End every block with a Disallow line.
An empty
Disallow:means "allow everything." Omitting the line entirely is undefined behavior in some parsers. Always write it explicitly. - 3.Validate before deploying.
Run your robots.txt through Google Search Console's robots.txt Tester and a third-party validator. A single misplaced character can silently block an entire crawler.
Verifying crawler access
After deploying, verify that each AI search crawler can actually fetch your pages. Three checks cover 95% of issues:
- ▸ Server logs — grep your access logs for the user-agent strings above. If you see fetches, the bot reached you. If you don't after 30 days, your block is likely still active.
- ▸ robots.txt Tester — Google Search Console lets you test any user-agent against your live robots.txt. Try "OAI-SearchBot" and "PerplexityBot" explicitly.
- ▸ Direct citation check — ask ChatGPT Search and Perplexity a question your site should answer. If neither cites you after 4 weeks of correct robots.txt, the problem is content, not access.
When to block training crawlers
Blocking GPTBot and ClaudeBot (training crawlers) does not affect search citations. Sites that opt out of training usually do so for three reasons: copyrighted content, paywalled content, or competitive IP. If your business model depends on any of these, blocking training is defensible. Just be sure to keep the search crawlers (OAI-SearchBot, Claude-SearchBot, Claude-User) allowed.
The Princeton GEO study measured a 30–40% visibility lift for pages with citations and statistics. None of that lift is reachable if the AI search crawler cannot fetch the page in the first place. Robots.txt is the prerequisite, not the strategy.
Frequently asked questions
Which AI crawler user-agents should I allow in robots.txt?
At minimum allow OAI-SearchBot (ChatGPT Search), PerplexityBot, Claude-SearchBot and Claude-User, Google-Extended (Gemini), and Applebot-Extended (Apple Intelligence). GPTBot and ClaudeBot are training crawlers and optional if you only want search citation.
Does Google AI Overviews use a separate crawler?
No. Google AI Overviews reuses the regular Googlebot index, so you do not need a new user-agent entry. Google-Extended is only used to opt Gemini training data in or out.
What is the difference between GPTBot and OAI-SearchBot?
GPTBot crawls pages for OpenAI model training. OAI-SearchBot crawls pages specifically to power ChatGPT Search inline citations. If your goal is GEO visibility, OAI-SearchBot is the one that matters.
Can I block GPTBot but still get cited in ChatGPT Search?
Yes. Blocking GPTBot only stops your content being used for training. As long as OAI-SearchBot is allowed, ChatGPT Search can still retrieve and cite your pages.
References: OpenAI platform docs — GPTBot & OAI-SearchBot (2025). · Anthropic documentation — ClaudeBot, Claude-SearchBot, Claude-User (2025). · Google Search Central — Google-Extended (2025). · Perplexity documentation — PerplexityBot (2025). · Previsible 2025 AI Search Traffic Report. · SERanking robots.txt audit of 300,000 domains (2025). · Aggarwal et al., "GEO: Generative Engine Optimization," arXiv:2311.09735, KDD 2024.
Want to check your site's GEO readiness?
Run the 27-point GEO auditRelated articles
ChatGPT Search: How It Cites Sources & How to Get Cited
ChatGPT Search uses OAI-SearchBot and inline citations. Learn the citation mechanism and 5 optimization tips to get your content referenced.
Perplexity AI: Citation Mechanism & Optimization Guide
Perplexity uses a strong citation model with 5-15 numbered references per answer. Here's how its PerplexityBot crawler indexes content.
Google AI Overviews: Complete Optimization Guide
Google AI Overviews now covers 16% of queries and cites 3-8 sources per answer. Learn how traditional SEO and Schema.org drive AIO visibility.