robots.txt

Last updated 17 June 2026 4 min

robots.txt is a plain text file that sits at the root of a website and tells web crawlers which parts of the site they are allowed to request. If your domain is example.com, the file lives at example.com/robots.txt — and nowhere else.

Crawlers check this agreed-upon, standard location before fetching anything else, which is why the file's location is fixed by convention rather than configurable.

How robots.txt works

When a crawler like Googlebot or Bingbot visits a domain, it requests /robots.txt first. The file tells the crawler which URLs to avoid. The syntax is built around two core ideas: which crawler the rules apply to (User-agent) and what those crawlers cannot fetch (Disallow).

A simple example:

User-agent: *
Disallow: /admin/
Disallow: /cart/
Allow: /admin/public-info.html

Sitemap: https://example.com/sitemap.xml

This tells every crawler (*) to stay out of /admin/ and /cart/, with one exception inside /admin/, and points them to the sitemap. You can also target specific bots by name — for example, User-agent: Googlebot followed by rules that apply only to Google.

Major search engines support wildcards: * matches any sequence of characters, and $ marks the end of a URL. So Disallow: /*.pdf$ blocks any URL ending in .pdf.

What robots.txt can't do

robots.txt is not a security mechanism. The file is publicly readable by anyone who types /robots.txt into a browser. Listing /secret-admin/ in it just tells the world where your admin panel is. Sensitive areas need real authentication, not just a Disallow line.

robots.txt does not reliably prevent indexing. If Google is told not to crawl a page, it won't fetch the page's contents — but if other sites link to that URL, Google can still index it based on those external signals, often showing it in results with the description "No information is available for this page."

To genuinely keep a page out of search results, allow crawling and use a noindex meta tag on the page itself. A page blocked in robots.txt can never have its noindex tag read — so the block defeats the goal.

robots.txt is a request, not enforcement. Reputable crawlers honour it. Scrapers, spam bots, and malicious actors will ignore it.

Common directives

User-agent specifies the crawler. Disallow blocks a path. Allow carves out an exception inside a disallowed path. Sitemap points to your XML sitemap and is the one directive that applies independently of any user-agent block.

Crawl-delay asks bots to wait between requests, but Google ignores it (Google's crawl rate is managed in Search Console instead). Bing and Yandex still honour it.

Order matters less than specificity: when multiple rules could apply to a URL, Google and most modern crawlers use the most specific match, not the first or last rule listed.

Practical notes

The file must be UTF-8 encoded and Google enforces a 500 KB size limit — content beyond that is ignored. Paths are case-sensitive, so /Admin/ and /admin/ are different rules. Each subdomain needs its own robots.txt; the file at example.com/robots.txt does not govern shop.example.com.

For SEO, never block CSS or JavaScript files that are needed to render the page — Google needs to fetch them to understand the layout, and blocking them can hurt rankings.

For large sites, robots.txt can also be used as a crawl-budget tool. Block faceted-search URLs, internal search results, and infinite-parameter combinations to stop crawlers from wasting time on low-value pages, freeing that budget for important content.

When to use robots.txt

Use robots.txt to keep crawlers out of areas that have no business being crawled — staging environments, admin paths, etc. Use it to declare your sitemap. Use it to manage crawl budget on large sites.

Don't use it to hide sensitive content, don't use it to remove pages from search results, and don't disallow assets the page depends on to render. Those are different problems with different tools — noindex tags, authentication, and proper site architecture, respectively.

A well-tuned robots.txt is short, deliberate, and rarely changes. If yours has grown into hundreds of lines of patches and exceptions, that's usually a sign the underlying site structure needs attention, not the file.

Disclaimer: All information contained herein is for informational purposes only. It is not advice or instructional.