What is a robots.txt file and why does it matter for SEO?

A robots.txt file is a plain text file placed at the root of your website (e.g. https://example.com/robots.txt) that tells search engine crawlers which pages or sections of your site they are allowed or not allowed to crawl. It uses a simple directive syntax with User-agent (which bot the rule applies to), Disallow (paths the bot should not crawl), and Allow (paths that override a Disallow). While robots.txt controls crawling, it does not directly control indexing. A page can still appear in search results if it has external links pointing to it, even if it is blocked in robots.txt. For controlling indexing, use meta noindex tags or canonical tags instead.

How do I test if a URL is blocked by robots.txt?

The fastest way to test if a URL is blocked by robots.txt is to use our free robots.txt tester. Enter your domain URL and the tool fetches your live robots.txt file, parses all the crawl rules, and lets you test any URL path against those rules for any user-agent (Googlebot, Bingbot, or a custom bot name). The robot checker shows you exactly which rule matched the path, which line it is on, and whether the bot is allowed or blocked. You can also paste robots.txt content directly without needing to publish it first.

What does Disallow: / mean in robots.txt?

Disallow: / in a robots.txt file means the bot is blocked from crawling the entire website. The forward slash represents the root of the site, so all paths beneath it are blocked. If this appears under User-agent: * or User-agent: Googlebot, it tells Google not to crawl any page on your site. This is one of the most critical robots.txt errors and is usually the result of a staging robots.txt being accidentally deployed to production. Our robots.txt validator flags this as a critical issue immediately.

Does Google honour the Crawl-delay directive?

No. Googlebot does not respect the Crawl-delay directive in robots.txt. Google has stated this clearly in its Search Central documentation. If you want to slow down Googlebot's crawl rate, you need to use the crawl rate settings inside Google Search Console. Other bots like Bingbot do respect Crawl-delay, so it is still worth including if you want to limit non-Google bot traffic. Our robots.txt tester specifically flags high Crawl-delay values and notes that they have no effect on Googlebot.

What is the difference between robots.txt and noindex?

robots.txt (Disallow) controls whether a bot can crawl a page. A noindex meta tag controls whether a crawled page should be included in the search index. The key difference is that if a page is blocked by robots.txt, Googlebot cannot read its noindex tag because it cannot access the page at all. This means a page blocked in robots.txt can still appear in search results if Google discovers it through links, it just shows with no title or description. Best practice is to only block pages in robots.txt that you also do not want indexed, and to use noindex on pages that should be crawlable but not shown in search results.

How large can a robots.txt file be?

Google processes a maximum of 500 KB of robots.txt content. Any content beyond 500 KB is ignored. In practice, robots.txt files should be as small as possible. A file over 100 KB usually indicates overly granular rules that could be consolidated. Our robots.txt validator checks file size and warns you if it is approaching Google's limit.

Robots.txt Tester · Validator · URL Path Tester · Robot Checker

Robots.txt
Tester and Validator Free

The most complete free robots.txt tester and validator online. Fetch any live robots.txt, paste content directly, validate crawl rules, test any URL path against Googlebot or any bot, and get a full robots.txt checker report with an SEO score, no login required.

Live fetch or paste robots.txt content

Test any URL path for any user-agent

Syntax highlighted robots.txt viewer

12 validation checks with SEO score

🤖

Robot Checker

Googlebot, Bingbot, any agent

🔍

URL Path Tester

Test exact paths against rules

✅

Syntax Validator

12 checks on every rule

⚡

Instant Results

Score, grade, parsed view

How it works

How the Robots.txt Tester Works

Our robots.txt validator fetches or accepts your file, parses every directive, runs 12 validation checks, and gives you a full robot checker report in seconds.

Fetch or Paste

Enter a domain to fetch the live robots.txt file, or paste content directly to test before publishing.

Parse Directives

Every User-agent, Disallow, Allow, Crawl-delay, and Sitemap directive is parsed and structured.

Run 12 Checks

File size, block-all rules, syntax errors, crawl-delay, duplicate agents, sitemap format and more are checked.

Test URL Paths

Test any URL path against any user-agent to see immediately whether it is allowed or blocked, and which rule matched.

Score and Fix

Get a score out of 100 and letter grade with actionable fix recommendations for every issue found.

What We Check

12 Robots.txt Validation Checks

Every check is based on Google's robots.txt specification, Search Central documentation, and real-world crawling behaviour confirmed by Google engineers.

🚫

Block-All Detection

Disallow: / under User-agent: * or Googlebot blocks your entire site from crawling. This is one of the most destructive robots.txt errors and our robots.txt tester flags it as critical immediately.

📏

File Size Check

Google ignores any robots.txt content beyond 500 KB. Our validator checks file size and warns you before it becomes a problem. Files over 100 KB are also flagged as unusually large.

🗺️

Sitemap Declaration

Declaring your sitemap inside robots.txt with Sitemap: https://yourdomain.com/sitemap.xml helps all major crawlers discover it without relying on Search Console submission. We check for presence and correct absolute URL format.

⏱️

Crawl-Delay Analysis

Googlebot ignores the Crawl-delay directive. Our robots.txt checker informs you of this, flags non-numeric values as errors, and notes that Google Search Console is the correct place to adjust Googlebot crawl rate.

👥

Duplicate User-Agent Groups

If the same bot name appears in multiple User-agent groups, crawlers may behave unpredictably. Our robots.txt validator detects all duplicate agent declarations and tells you which ones to merge.

✳️

Wildcard Syntax

The * wildcard and $ end-anchor are the only pattern characters Google supports in robots.txt paths. We check for misplaced $ anchors, multiple anchors in one rule, and redundant trailing wildcards.

⚔️

Allow vs Disallow Conflicts

When a path matches both an Allow and a Disallow rule, Google uses the most specific one. Our robot checker identifies these override situations so you can verify the correct rule wins.

🌐

Wildcard Agent Check

A User-agent: * block applies rules to all crawlers not specifically named. We flag when this is missing, which may mean some bots receive no crawl guidance from your robots.txt file.

🔗

Sitemap URL Validation

Sitemap paths declared with relative URLs are invalid. Our robots.txt validator ensures every Sitemap: directive uses a full absolute URL starting with https://.

Complete Guide

Robots.txt: The Complete SEO Guide

Everything you need to know about writing, testing, and validating your robots.txt file to get crawl rules right the first time.

🤖

What Is robots.txt and Why Every Site Needs a Robots.txt Tester

A robots.txt file is a plain text file located at the root of your website (for example, https://yoursite.com/robots.txt) that instructs web crawlers which parts of your site they are and are not permitted to crawl. It follows the Robots Exclusion Protocol, a standard developed in 1994 that all major search engines including Google, Bing, and Yandex support.

Every site with pages you want to control crawler access to needs a properly configured robots.txt file. But writing the file correctly is surprisingly easy to get wrong. A single misplaced slash, a missing User-agent line, or an accidental Disallow: / can block your entire site from Google in minutes. This is exactly why a robots.txt tester is not optional. It is a critical sanity check before you deploy any changes to your robots.txt file.

What robots.txt can and cannot do

Robots.txt controls crawling, not indexing. A page blocked by Disallow will not be crawled, but it can still appear in search results if Google discovers it through links pointing to it. In that case, Google may index the URL without being able to read its content, showing it with no title or description. If you want to prevent indexing, use a noindex meta tag instead. Our Noindex Checker can verify your noindex tags are set up correctly alongside your robots.txt rules.

Robots.txt also does not protect content from being accessed directly. If someone knows a URL, they can visit it regardless of what your robots.txt says. For private content, use authentication and server-side access controls.

📝

Robots.txt Syntax Guide: Writing Rules That Actually Work

Correct robots.txt syntax is essential. Google's robots.txt parser is strict, and a poorly formatted file may be partially or entirely ignored. Here are the key syntax rules every robots.txt validator checks.

User-agent

The User-agent: line declares which crawler the following rules apply to. User-agent: * applies to all crawlers. User-agent: Googlebot applies only to Google's main crawler. User-agent: Bingbot applies only to Bing. You can have multiple User-agent lines before a set of rules to apply those rules to multiple bots. Each group of rules must begin with at least one User-agent line.

Disallow and Allow

Disallow: /path/ blocks crawlers from accessing that path and anything beneath it. Allow: /path/ explicitly permits access to a path, overriding a broader Disallow. When both match a URL, Google uses the most specific rule (the one with the longest matching path). An empty Disallow: or Allow: / means allow all.

Wildcards and anchors

Google supports two special characters in path patterns. The * wildcard matches any sequence of characters. Disallow: /search* blocks any URL starting with /search. The $ anchor matches end of URL. Disallow: /*.pdf$ blocks all URLs ending in .pdf. The $ must appear at the end of the pattern to work correctly. Our robots.txt tester flags patterns where $ appears in the wrong position.

Sitemap declaration

You can declare your sitemap location directly in robots.txt using Sitemap: https://example.com/sitemap.xml. This is not part of the original robots.txt specification but is widely supported by Google, Bing, and other crawlers. Always use absolute URLs for sitemap declarations. Pair your robots.txt sitemap declaration with our Sitemap Validator to ensure the sitemap itself is correctly formatted.

🚫

The Most Dangerous Robots.txt Error: Disallow: /

The single most catastrophic mistake in robots.txt is placing Disallow: / under User-agent: * or User-agent: Googlebot. This one line tells every crawler it may not access any page on your website, effectively removing your site from search engine indexes.

Why this happens

This error almost always originates from a staging or development environment. When developers build a new site or work on a redesign, they typically set robots.txt to Disallow: / to prevent the staging site from appearing in search results. The problem occurs when this staging robots.txt gets accidentally pushed to the production server. The result is that Google stops crawling your site, and within days or weeks, your pages drop out of search results entirely.

How to detect and fix it

Our robots.txt tester checks for this condition immediately and flags it as a critical error. To fix it, remove the Disallow: / line (or replace it with Disallow: to allow everything) and re-submit your site in Google Search Console to trigger a fresh crawl. Recovery typically takes days to weeks depending on your site's crawl frequency. Our Sitemap Validator can help you re-submit your sitemap to speed up re-indexing.

Prevention with a robots.txt validator

The best way to prevent accidental deployment of a blocking robots.txt is to run a robots.txt validator as part of every site deployment process. Use our tool to test your robots.txt file before publishing. The paste mode lets you validate content without needing to push it live first.

⏱️

Crawl-Delay in robots.txt: Does Google Actually Use It?

The Crawl-delay directive was designed to tell crawlers how many seconds to wait between fetching pages on your site. A value of Crawl-delay: 10 asks the bot to pause 10 seconds between requests.

Googlebot ignores Crawl-delay

Googlebot does not honour the Crawl-delay directive. Google has stated this clearly in Search Central documentation. The reason is that Google manages crawl rate centrally and adapts it based on server response times rather than relying on site-declared delays. If your server is slow to respond, Googlebot naturally slows down. If you want to explicitly reduce Googlebot's crawl rate, the only supported method is to use the crawl rate setting inside Google Search Console.

Other bots do respect Crawl-delay

Bingbot, Yandex, and many other crawlers do respect Crawl-delay. If you are worried about non-Google bot traffic overwhelming your server, setting a reasonable Crawl-delay is still useful for those bots. Our robots.txt checker notes when Crawl-delay is present, informs you of Googlebot's non-compliance, and flags unreasonably high delay values.

🔍

How to Test robots.txt Rules Against Specific URLs

Understanding whether a specific URL is blocked by your robots.txt requires testing the path against each rule group in order. This is not always obvious when rules use wildcards or when Allow and Disallow directives overlap.

Using the URL path tester

After loading your robots.txt with our tool, use the URL Path Tester section to enter any path (for example, /admin/settings) and select the user-agent you want to test as (Googlebot, Bingbot, or custom). The robot checker shows you exactly which rule matched, which line it is on, and whether that bot is allowed or blocked from crawling that path.

How Google picks the matching rule

When multiple rules match a URL, Google applies the most specific rule based on the length of the matching pattern. For example, if you have Disallow: /blog/ and Allow: /blog/featured/, then /blog/featured/article is allowed because /blog/featured/ is more specific. If two rules have the same length match, the Allow takes precedence. Our robots.txt tester applies this exact Google algorithm when you test paths.

For a complete crawlability audit alongside your robots.txt check, also run our Canonical Tag Checker to ensure your canonical tags reinforce your crawl directives correctly.

📋

Common Robots.txt Mistakes Caught by a Robots.txt Validator

Beyond the catastrophic Disallow: / error, there are many smaller robots.txt mistakes that silently harm crawlability or waste crawl budget. Our robots.txt validator catches all of them.

Blocking CSS and JavaScript files

Older SEO advice recommended blocking /wp-content/ or /assets/ folders. Modern Google needs to render your pages like a browser, which requires accessing CSS and JavaScript. Blocking these resources prevents Googlebot from fully rendering your pages, which can hurt rankings. Remove Disallow rules targeting CSS, JS, and image directories.

Redundant trailing wildcards

Disallow: /path/* is identical to Disallow: /path/ because paths already match everything beneath them. The trailing /* is unnecessary. Our robots.txt checker flags these for cleanup.

Missing the wildcard agent block

If your robots.txt only contains rules for specific bots (Googlebot, Bingbot) but has no User-agent: * block, then all other crawlers receive no guidance and are free to crawl everything. This is usually fine, but if you have a reason to restrict content scraping bots, the wildcard block is where to do it.

Relative sitemap URLs

A sitemap declaration like Sitemap: /sitemap.xml uses a relative path. All major crawlers require absolute sitemap URLs in robots.txt. The correct format is Sitemap: https://example.com/sitemap.xml. Our robots.txt validator detects all relative sitemap declarations.

Related Tools

Complete Your Technical SEO Audit

Robots.txt is the first line of crawl control. Use these tools to cover every other layer of technical SEO crawlability and indexing.

🗺️

Sitemap.xml Validator

Validate your XML sitemap structure, check all URLs are crawlable, and confirm correct format for Google and Bing submission.

🎲

Canonical Tag Checker

Detect missing, duplicate, and broken canonical tags across any single URL or your entire website via site crawl.

🚫

Noindex / Nofollow Checker

Find pages accidentally blocked from Google's index through noindex tags or X-Robots-Tag HTTP headers.

↪️

Redirect Chain Checker

Trace every redirect hop from any URL and identify redirect loops, chains, and PageRank leakage points.

💔

Broken Link Checker

Find all broken internal and external links on any page that waste crawl budget and harm user experience.

🏝️

Orphan Page Finder

Discover pages that are not linked from anywhere on your site and therefore may not be discovered by crawlers.

📑

Pagination Tag Checker

Validate rel=next and rel=prev pagination tags so Google can correctly understand your paginated content series.

🔗

External Link Checker

Audit all outbound links on any page for broken links, nofollow status, redirect chains, and missing anchor text.

🔗

SEO Friendly URL Checker

Check if any URL is SEO friendly. Analyze slug, hyphens, length, uppercase, parameters, and more.

🛠️

All 40+ SEO Tools

Browse the full Behind the Search free SEO tool collection covering on-page, technical, content, local and link building.

FAQ

Frequently Asked Questions

Common questions about robots.txt testing, validation, crawl rules, and our free robot checker tool.

What is a robots.txt tester and why should I use one?

A robots.txt tester is a tool that fetches or accepts your robots.txt file, parses all the crawl directives, validates the syntax, and lets you test whether specific URL paths are allowed or blocked for any user-agent. You should use one because robots.txt errors are silent. Your site will not display an error if your robots.txt is wrong. A single bad line can block Googlebot from your entire site, and you may not notice for days or weeks until rankings drop. Running a robots.txt validator every time you change the file takes under 30 seconds and prevents these issues.

What does "robot checker" mean?

A robot checker is the same as a robots.txt tester or robots.txt validator. It checks whether your robots.txt file correctly controls how robots (web crawlers like Googlebot and Bingbot) can access your site. Our robot checker fetches your live file, parses every User-agent group, validates syntax, and lets you test any URL path against the rules. The term "robot checker" typically emphasizes the URL path testing functionality, which tells you definitively whether a specific bot can access a specific page.

How do I test if Googlebot can crawl a specific URL?

Use our robots.txt tester. Enter your domain URL and click Fetch and Validate. After the file loads and validates, scroll to the URL Path Tester section. Type the path you want to test (for example /admin/settings), select Googlebot as the user-agent, and click Test. The tool parses your robots.txt using the same algorithm Google uses, applies the most-specific-rule logic, and tells you whether Googlebot is allowed or blocked, plus which exact rule and line number matched.

Does robots.txt affect SEO rankings directly?

Robots.txt does not directly affect rankings but has powerful indirect effects. If you block Googlebot from crawling important pages, those pages cannot be re-crawled to pick up content updates, new internal links, or fresh signals. Over time, blocked pages may lose rankings compared to competitors whose pages are freely crawled. Conversely, using robots.txt to block thin content, search result pages, and session ID URLs prevents Google from wasting crawl budget on low-value pages, leaving more budget for your important content. A well-configured robots.txt file, verified with a robots txt validator, supports both crawl efficiency and ranking quality.

What is the robots.txt file size limit?

Google processes a maximum of 500 KB of robots.txt content. Any content beyond 500 KB is ignored as if it does not exist. This means crawl rules placed after the 500 KB limit will not apply to Googlebot. Practically speaking, most robots.txt files are well under 10 KB. Files over 100 KB usually contain redundant or overly granular rules that should be consolidated. Our robots.txt checker reports file size on every validation run and warns you if you are approaching the limit.

Can I use robots.txt to block only part of my site?

Yes. You can block specific directories, file types, or URL patterns using Disallow: directives. For example, Disallow: /admin/ blocks the admin section, Disallow: /*.pdf$ blocks all PDF files, and Disallow: /search? blocks search result pages. You can combine Disallow and Allow rules to create exceptions. For example, Disallow: /members/ combined with Allow: /members/public/ blocks most of the members section but allows the public subfolder. Our robots.txt tester includes a URL path test tool so you can confirm each rule is behaving exactly as intended before going live.

Is this robots.txt tester free?

Completely free, no login, no account, no limits. You can fetch and validate as many robots.txt files as you need and run as many URL path tests as required. Behind the Search builds all tools free. Browse all 40+ free SEO tools covering technical SEO, on-page SEO, content, local SEO, and link building.

Behind the Search

Free SEO Tools Built Around
How Google Actually Crawls

Behind the Search builds every tool, including this robots.txt tester and validator, around confirmed Google guidance, Search Central documentation, and real crawling behaviour. No guesswork, no vanity metrics, no login required.

Use our complete technical SEO toolkit to validate every layer of your site's crawlability: robots.txt rules, canonical tags, sitemap format, redirect chains, broken links, and more. Browse all 40+ free tools.

Browse All 40+ SEO Tools Sitemap Validator

Validation Checks

Test Modes

40+

Free Tools

Robots.txtTester and Validator Free