Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file is a plain-text file placed at the root of a website (e.g. yoursite.com/robots.txt) that tells web crawlers which pages or sections they are and are not allowed to access. It uses the Robots Exclusion Protocol — a set of simple directives like "User-agent" (which crawler), "Disallow" (paths to block), "Allow" (paths to permit), and "Sitemap" (your XML sitemap URL). Robots.txt is one of the first files search engine crawlers check when visiting a site.

Question 2

Should I block AI crawlers in my robots.txt?

Accepted Answer

Whether to block AI training crawlers depends on your priorities. Blocking them prevents your content from being used to train AI language models (like GPT and Gemini), which some publishers and content creators prefer for copyright and commercial reasons. However, some AI systems (like Bing's AI and Google's AI Overviews) use crawlers that also power their search indexing — blocking them may reduce your visibility in those features. The decision is yours: the tool provides ready-made presets for blocking specific AI crawlers while leaving major search engine bots unaffected.

Question 3

Does robots.txt prevent pages from being indexed?

Accepted Answer

Robots.txt prevents crawlers from accessing your pages, but it does not guarantee those pages won't appear in search results. Google can still index a page it hasn't crawled if other sites link to it — it will just show minimal information without a snippet. To completely prevent indexing, use a "noindex" meta tag or X-Robots-Tag HTTP header instead of (or in addition to) robots.txt.

Question 4

What AI crawlers are included in the blocking preset?

Accepted Answer

The "Block AI Crawlers" preset blocks the following known AI training bots: GPTBot (OpenAI), ChatGPT-User (OpenAI), ClaudeBot (Anthropic), anthropic-ai, CCBot (Common Crawl, used by many AI companies), Google-Extended (Google AI training, separate from Googlebot), PerplexityBot, Bytespider (ByteDance/TikTok), Meta-ExternalAgent (Meta), and Amazonbot. Googlebot and Bingbot are NOT included — these bots also power search indexing and should typically be allowed.

Question 5

What happens if I block Googlebot?

Accepted Answer

Blocking Googlebot entirely with "Disallow: /" will prevent Google from crawling your site, which will eventually remove your pages from Google search results as they become stale and unapproved. This is almost never intentional. Be very careful when configuring User-agent rules — always explicitly target only the bots you want to block. The generator uses named User-agent directives for each bot rather than a catch-all wildcard, making it safer to use.

Question 6

How do I verify my robots.txt is working correctly?

Accepted Answer

After uploading your robots.txt to your site's root directory, you can verify it using Google Search Console's robots.txt tester (found under Settings > Crawling). Enter specific URLs to test whether Googlebot can access them under your current rules. You can also visit yoursite.com/robots.txt directly in a browser to confirm the file is live and readable. Allow 24–48 hours for crawlers to re-read an updated robots.txt.

User-agent	Owner	Purpose	Block?
Googlebot	Google	Google Search indexing	No
Bingbot	Microsoft	Bing Search indexing	No
Google-Extended	Google	AI training (Gemini)	Optional
GPTBot	OpenAI	AI training (GPT models)	Optional
ChatGPT-User	OpenAI	ChatGPT browsing plugin	Optional
ClaudeBot	Anthropic	AI training (Claude)	Optional
CCBot	Common Crawl	Open data (used in AI training)	Optional
PerplexityBot	Perplexity AI	Perplexity search/AI answers	Optional
Bytespider	ByteDance	TikTok/AI data collection	Recommended
Amazonbot	Amazon	Alexa/AI training	Optional
AhrefsBot	Ahrefs	SEO backlink index	Optional
SemrushBot	Semrush	SEO data collection	Optional

Robots.txt Generator

Crawler rules

Common web crawlers reference

Frequently asked questions