Related Tools
How to Use
- 1Choose a preset configuration to get started quickly: Allow All (permits all crawlers), Block All Bots (blocks everything), Block AI Bots (blocks GPTBot, ChatGPT-User, Google-Extended, CCBot, and anthropic-ai), or WordPress Default (standard rules for WordPress sites with wp-admin and wp-includes blocked).
- 2Add one or more User-agent rule blocks to target specific crawlers. Use the wildcard '*' to apply rules to all bots, or specify individual agents like Googlebot, Bingbot, GPTBot, or Yandex for granular control over which crawlers can access which parts of your site.
- 3Set Disallow paths to block access to specific directories or pages (e.g., /admin/, /private/, /staging/) and Allow paths to create exceptions within blocked directories (e.g., Allow: /admin/public-assets/). The order of rules matters — more specific paths take precedence.
- 4Configure optional directives including Crawl-delay (seconds between requests, supported by Bing and Yandex but not Google), Sitemap URL (tells crawlers where to find your XML sitemap), and Host (specifies the preferred domain for Yandex).
- 5Review the generated robots.txt output in the preview panel. Verify that your intended pages are accessible to search engines and that sensitive areas are properly blocked. Pay special attention to paths that should remain crawlable for SEO, such as CSS and JavaScript files needed for rendering.
- 6Copy the generated content to your clipboard or download it as a robots.txt file. Upload the file to the root directory of your web server so it is accessible at https://yourdomain.com/robots.txt. Test the file using Google Search Console's robots.txt Tester to confirm rules work as expected.
About robots.txt Generator
The robots.txt Generator creates standards-compliant robots.txt files following the Robots Exclusion Protocol, the widely adopted standard that allows website owners to communicate crawling preferences to search engine bots and other automated agents. The tool supports user-agent targeting, Disallow and Allow directives, Crawl-delay values, Sitemap references, and Host directives, covering every directive recognized by major search engines.
The robots.txt file is one of the most important technical SEO files on any website. It lives at the root of your domain (e.g., https://example.com/robots.txt) and is the first file search engine crawlers request when visiting your site. Google, Bing, and other compliant crawlers read this file to understand which pages and directories they are permitted to access. Properly configuring robots.txt helps you manage crawl budget — the number of pages a search engine will crawl on your site within a given time period — by directing crawlers away from low-value pages and toward your most important content.
Google's documentation specifies that robots.txt controls crawling, not indexing. This is a critical distinction: blocking a URL in robots.txt prevents crawlers from accessing the page, but it does not prevent the URL from appearing in search results if other pages link to it. Google may display the URL with a 'No information is available for this page' message. To prevent indexing entirely, use the noindex meta tag or X-Robots-Tag HTTP header instead of robots.txt. The tool's output follows the syntax Google expects, including proper formatting of User-agent, Disallow, Allow, and Sitemap directives.
The tool includes one-click presets designed for common use cases. The Block AI Bots preset adds rules for GPTBot (OpenAI's crawler used for training data), ChatGPT-User (OpenAI's browsing agent), Google-Extended (Google's AI training crawler, separate from Googlebot), CCBot (Common Crawl's bot used by many AI companies), and anthropic-ai (Anthropic's training crawler). These presets reflect the growing need for website owners to control how their content is used for AI model training while maintaining full accessibility for search engine indexing.
For WordPress sites, the WordPress Default preset includes rules to block crawling of wp-admin, wp-includes, and other WordPress-specific directories that should not appear in search results. It keeps the wp-admin/admin-ajax.php endpoint accessible since some themes and plugins require it for front-end functionality. This preset serves as a starting point that site owners can customize based on their specific plugin and theme requirements.
All generation runs entirely in your browser using client-side JavaScript. No data is transmitted to external servers, and there are no usage limits or account requirements. The generated file follows the exact syntax specification that Google, Bing, Yandex, and other major crawlers expect, ensuring your robots.txt is parsed correctly. After generating your file, we recommend validating it with Google Search Console's robots.txt Tester to confirm that your rules produce the intended crawling behavior.
Frequently Asked Questions
What is robots.txt and why does my website need one?
robots.txt is a plain text file placed at the root of your website that communicates crawling rules to search engine bots and other automated agents following the Robots Exclusion Protocol. While not strictly required, a robots.txt file helps you manage crawl budget by directing search engines away from low-value or duplicate pages, block access to sensitive directories like admin panels, and reference your XML sitemap for faster discovery. Most websites benefit from having a properly configured robots.txt file.
Where should I place my robots.txt file?
The robots.txt file must be placed at the root of your domain, accessible at https://yourdomain.com/robots.txt. It must be at the top-level directory — not in a subdirectory. For most web servers (Apache, Nginx), place the file in the public or document root folder. For platforms like WordPress, the file is typically managed through SEO plugins like Yoast. For Next.js and similar frameworks, place it in the public directory so it is served statically at the root path.
Does Disallow: / block all crawlers from my entire site?
When applied to User-agent: *, the directive Disallow: / instructs all compliant crawlers not to access any page on your site. However, this only affects bots that respect the Robots Exclusion Protocol. Malicious scrapers, spam bots, and some AI training crawlers may ignore robots.txt entirely. Additionally, blocking crawling does not prevent indexing — if other websites link to your pages, Google may still list those URLs in search results without accessing the page content. For complete access control, combine robots.txt with server-side authentication or IP-based blocking.
Can I block AI training bots while allowing search engines?
Yes. The Block AI Bots preset creates separate User-agent blocks for GPTBot, ChatGPT-User, Google-Extended, CCBot, and anthropic-ai with Disallow: / rules, while leaving the default User-agent: * rule as Allow. This means Googlebot, Bingbot, and other search engine crawlers can still access your content normally, but AI training crawlers from OpenAI, Google AI, Common Crawl, and Anthropic are instructed not to scrape your pages. Keep in mind that compliance is voluntary — only well-behaved bots will respect these directives.
Does robots.txt guarantee my pages will not be indexed by Google?
No, and this is one of the most common misconceptions about robots.txt. Blocking a URL in robots.txt prevents crawlers from accessing the page content, but Google may still index the URL itself if it discovers it through external links, sitemaps, or other references. The indexed result will show limited information since Google cannot read the page. To reliably prevent indexing, use a noindex meta tag in the page's HTML or an X-Robots-Tag: noindex HTTP header. These require the page to be crawlable, so do not block them in robots.txt.
What is Crawl-delay and which search engines support it?
Crawl-delay is a directive that tells crawlers to wait a specified number of seconds between successive requests to your server. It helps prevent server overload from aggressive crawling. Bing and Yandex officially support Crawl-delay, but Google does not — Google ignores this directive entirely. To control Googlebot's crawl rate, use the Crawl Rate Settings in Google Search Console instead. If your server is struggling under crawler load, Crawl-delay can help with non-Google bots, but server-side rate limiting is a more reliable solution.
Should I include my sitemap URL in robots.txt?
Yes, adding a Sitemap directive to your robots.txt file is a recommended best practice. The Sitemap line (e.g., Sitemap: https://example.com/sitemap.xml) tells all crawlers where to find your XML sitemap, enabling faster discovery of your pages. This is especially valuable for new websites, sites with deep page hierarchies, or large e-commerce catalogs. You can include multiple Sitemap lines if you have several sitemap files. This directive works alongside submitting your sitemap through Google Search Console — both methods complement each other.
What happens if my robots.txt file has syntax errors?
Syntax errors in robots.txt can cause search engines to misinterpret your crawling rules, potentially blocking pages you want indexed or allowing access to pages you intended to block. Common errors include missing colons after directives, incorrect User-agent strings, and spaces in URLs. Googlebot is relatively forgiving with minor syntax issues, but other crawlers may be stricter. Always validate your robots.txt using Google Search Console's robots.txt Tester, which shows exactly how Googlebot interprets each rule and highlights any syntax problems.