Crawler reference

AI crawler user-agent list for robots.txt planning

Use this list as a starting point for policy review, not as a permanent authority. Provider names and behavior can change.

Training and dataset crawlers

  • GPTBot, ClaudeBot, CCBot, Bytespider, and similar tokens are often reviewed for training or dataset policy.
  • Google-Extended and Applebot-Extended are control tokens that should be checked against official docs.

Search and user-requested fetches

  • OAI-SearchBot, PerplexityBot, and Claude-SearchBot may affect AI search discovery.
  • ChatGPT-User and Claude-User are closer to user-triggered retrieval than broad crawling.

How to use the list

Start with your business policy, then choose crawler-specific rules. Avoid copying a long blocklist without checking whether it blocks search or user-requested access you actually want.