robots.txt for AI Crawlers: GPTBot, Google-Extended, PerplexityBot, and Search Access
AI crawler rules are now part of technical SEO. The mistake is treating every bot the same. Some user agents support search or answer retrieval, some relate to training controls, and normal Googlebot/Bingbot access still affects traditional search visibility.
The quick answer: use robots.txt to block private or non-public paths, but be careful before blocking AI-related crawlers if your goal is to be cited in AI answers. Separate training-control decisions from search-indexing decisions.
BaseToolbox's robots.txt generator helps create Allow and Disallow rules, but the policy choice is still yours.
Start With the Goal
Before editing robots.txt, decide what you want:
| Goal | Likely policy |
|---|---|
| Keep admin, staging, or internal paths out | Disallow those paths for all crawlers. |
| Stay visible in Google Search | Do not block Googlebot from public pages. |
| Appear in AI answers and citations | Be cautious about blocking AI answer/search crawlers. |
| Reduce training use | Review each provider's training or AI-specific controls. |
| Protect private content | Do not rely only on robots.txt; require authentication. |
robots.txt is a crawler instruction, not an access control system. Private data should not be publicly reachable just because bots are asked not to crawl it.
Common AI-Related User Agents
As of June 30, 2026, common AI-related crawler names include OpenAI's GPTBot, ChatGPT-User, and OAI-SearchBot; PerplexityBot; and Google's Google-Extended control. Provider names and behavior can change, so check official documentation before publishing a final policy.
Useful references:
- OpenAI crawlers and user agents
- PerplexityBot documentation
- Google common crawlers: Google-Extended
- Google AI features and your website
Example Policy Patterns
A conservative public-site pattern usually keeps public content crawlable and blocks private paths:
User-agent: *
Disallow: /admin/
Disallow: /account/
Disallow: /checkout/
Sitemap: https://example.com/sitemap.xml
If you decide to block a specific AI crawler, write the rule explicitly:
User-agent: GPTBot
Disallow: /
Do not copy bot-blocking lists blindly. A rule that sounds privacy-friendly can also reduce the chance that AI systems discover or cite your public guides.
What robots.txt Cannot Do
robots.txt cannot:
- Hide a URL from users who already know it
- Protect private documents
- Remove a page that is already indexed
- Control every AI system on the internet
- Replace authentication, noindex, headers, or deletion
Use authentication for private areas. Use noindex for pages that should not appear in search results. Use robots.txt mainly for crawl guidance.
A Practical Review Checklist
Before publishing an AI crawler policy, review it with three people or roles: SEO, legal/privacy, and engineering. SEO should confirm that public pages remain crawlable. Legal or privacy should decide whether training controls are needed. Engineering should confirm private routes are protected by authentication, not only by robots.txt.
Also test the final file with the exact user-agent blocks you intended. A single broad Disallow: / under the wrong user agent can remove more access than expected.
For content teams, keep a short changelog beside major robots.txt edits. If traffic, indexing, or AI referral visibility changes later, you will know which crawler policy changed and when.
After a policy change, monitor more than one metric. Check crawl logs when available, Search Console indexing, sitemap discovery, and analytics referrals over several weeks. AI answer visibility can move slowly, and a one-day change in referral traffic is not enough to prove that a crawler rule helped or hurt.
For a static utility site, the safest default is usually simple: keep public tools, blog posts, privacy pages, and help content crawlable; block only internal, duplicate, generated, or non-public paths. That keeps the site easy for search engines and answer systems to understand without pretending robots.txt is a security layer.
Quick Answer
For GEO and AI visibility, do not block AI-related crawlers by default unless you intentionally want to limit that access. Keep public helpful content crawlable, block private paths, and review each provider's current bot documentation before deciding.
FAQ
Does allowing AI crawlers guarantee ChatGPT or Perplexity citations?
No. Allowing access only makes citation possible. Content quality, authority, freshness, structure, and search visibility still matter.
Should I block Google-Extended?
That is a business decision. Review Google's documentation and decide whether the control matches your content and AI training policy. Do not confuse it with blocking Googlebot from Search.
Is robots.txt enough for confidential content?
No. Confidential content should require authentication or be removed from public access. robots.txt is not a security boundary.
Ready to try it yourself?
Put what you have learned into practice with our free online tool.
Generate robots.txt