Robots.txt Generator
Generate a valid robots.txt file for your website. Control crawler access with Allow and Disallow rules. Free online tool, no signup needed.
Generate a Robots.txt File That Controls How Search Engines Crawl Your Site
The robots.txt file is one of the most fundamental technical SEO components on any website, yet it's also one of the most frequently misconfigured. A correctly written robots.txt file guides search engine crawlers efficiently through your site, preventing them from wasting crawl budget on pages you don't want indexed while ensuring they find and crawl your important content. A badly written one can inadvertently block your most valuable pages from being indexed at all—a mistake that can devastate organic search traffic and is surprisingly easy to make. Our free robots.txt generator produces a syntactically correct file based on your specifications, eliminating the syntax errors that commonly break robots.txt implementations.
Select your rules, add your sitemap URL, and copy the generated file content. The output follows the Robots Exclusion Standard that all major search engines respect, including Google, Bing, Yahoo, and DuckDuckGo.
How Robots.txt Works
When a search engine crawler—Googlebot, Bingbot, or any other—first visits a website, it requests the robots.txt file at the root of the domain before crawling any other pages. The crawler reads the file and uses its directives to determine which parts of the site it's allowed to access and which it should skip.
The file is organized into groups of directives, each starting with a `User-agent` line that specifies which crawler the following rules apply to. The wildcard `User-agent: *` applies to all crawlers. After the User-agent line, `Allow` and `Disallow` directives specify which URL paths the crawler may and may not visit. A blank line between groups separates the rules for different crawlers.
The basic syntax is simple, but the matching rules have specific behaviors worth understanding before deploying a robots.txt file on a production website.
Allow vs. Disallow: The Core Directives
A `Disallow:` directive followed by a URL path tells crawlers not to request any URL that starts with that path. `Disallow: /admin/` prevents crawling of all URLs under `/admin/`. `Disallow: /cart` prevents crawling of `/cart`, `/cart/`, `/cart-items`, and any other URL beginning with `/cart`. `Disallow: /` with no path listed actually means "allow everything" because the empty path matches nothing according to the standard—despite common intuition that it would mean "disallow all."
An `Allow:` directive explicitly permits access to a URL that would otherwise be blocked by a broader Disallow rule. This lets you create exceptions within blocked paths. If you have `Disallow: /account/` but want `/account/signup` to be crawlable, adding `Allow: /account/signup` above the Disallow directive creates the exception. The more specific rule takes precedence in Google's implementation when both an Allow and a Disallow rule match the same URL.
Common Robots.txt Patterns for Real Sites
Standard SEO-Friendly Configuration
The most common practical robots.txt for a standard website allows all crawlers everywhere while specifying the sitemap location: `User-agent: *` on its own line, followed by `Allow: /`, followed by a blank line and `Sitemap: https://yourdomain.com/sitemap.xml`. This configuration is functionally identical to having no robots.txt at all for crawl permissions, but the Sitemap directive helps search engines discover and prioritize your sitemap URL, which is genuinely useful for large sites.
Blocking Admin and Internal Pages
Most websites have sections that should never appear in search results: admin panels, login pages, user account pages, cart and checkout pages, order confirmation pages, staging paths. Blocking these from crawling reduces wasted crawl budget on non-indexable pages and prevents them from appearing in search results (in combination with noindex tags). Common Disallow rules for this purpose include `/admin/`, `/wp-admin/` for WordPress, `/cart/`, `/checkout/`, `/account/`, `/orders/`, and `/thank-you/`.
Managing Crawl Budget on Large Sites
For large e-commerce sites, news sites, or any website with thousands or tens of thousands of pages, crawl budget management becomes important. Search engines allocate a limited number of crawls per day to any given domain, and having crawlers spend that budget on faceted navigation URLs, duplicate sorted/filtered product listing pages, or printer-friendly page variants consumes crawl budget that should go to canonical product and content pages. Disallowing common faceted navigation patterns (URLs with `?color=`, `?size=`, `?sort=`) and internal search result pages (`/search?`) is standard practice for large e-commerce sites.
Blocking Specific Crawlers
Different User-agent entries let you apply different rules to different crawlers. You might allow Googlebot and Bingbot full access while blocking certain aggressive or low-quality scrapers. Common aggressive crawler user agents include AhrefsBot, SemrushBot, MJ12bot, and DotBot—all legitimate SEO tools but all placing significant crawl load on servers if not rate-limited or blocked. Including `User-agent: AhrefsBot` followed by `Disallow: /` in your robots.txt blocks that specific tool from crawling without affecting other crawlers.
What Robots.txt Cannot Do
Robots.txt is commonly misunderstood as a security tool, and it is not one. Disallow directives are recommendations, not access controls—malicious bots and scrapers do not respect them. If you need to protect content from unauthorized access, authentication, server-level access controls, and IP blocking are the appropriate mechanisms.
Additionally, a Disallow directive prevents crawlers from visiting a URL, but it does not prevent them from indexing it. Google can and does index URLs that are blocked in robots.txt if other sites link to those URLs—Google simply indexes the URL without seeing the page content, sometimes showing the URL in search results with no title or description (just the URL itself). If preventing indexing is the goal, a `noindex` meta tag on the page itself is more reliable than a robots.txt Disallow directive.
Free, Private, and Instant
The robots.txt generator runs entirely in your browser. No domain information, URLs, or configuration you enter is transmitted to any server. The tool is completely free with no account required. Generate your robots.txt content, copy it, and deploy it to your site's root directory immediately.