Robots & LLMs Checker

Analyze a domain's robots.txt and llms.txt — crawling rules, sitemaps and AI indexing.

What is robots.txt?

The robots.txt file is a plain-text file at the root of a domain that gives instructions to web crawlers (Googlebot, Bingbot, etc.) about which pages they are allowed to index. It follows the Robots Exclusion Standard protocol and is the primary way to control crawler access to your site.

Core robots.txt directives

User-agent

Specifies which crawler the rule group applies to. * means "all crawlers". You can have separate groups for each bot (e.g. Googlebot, GPTBot).

Disallow / Allow

Disallow: /admin/ blocks the path from being indexed. Allow: / grants explicit access. An empty Disallow: means "the entire site is allowed". Disallow: / blocks the entire site.

Sitemap

The Sitemap: directive tells crawlers where to find the site's XML sitemap, making page discovery easier.

Crawl-delay

Asks the crawler to wait N seconds between requests. Not supported by Googlebot, but respected by many other crawlers.

What is llms.txt?

The llms.txt file is an emerging standard (llmstxt.org) that allows site owners to provide structured information about their content to Large Language Models (LLMs) and AI agents. It follows Markdown format and describes what the site does, which pages are important and what an AI model should know.

llms-full.txt

The extended version of llms.txt with the full content of pages — used by AI tools that want a deeper understanding of the site.

Frequently Asked Questions

Is robots.txt mandatory?

No, but its absence means all crawlers have full access. If it does not exist, Googlebot will index the entire site.

Does robots.txt block users from accessing pages?

No. robots.txt applies to crawlers only — it does not prevent any user from visiting a page. For real protection you need authentication.

What does Disallow: / mean for Googlebot?

Googlebot will not index any page on the site. This does not mean pages disappear from results immediately — pages that are already indexed remain until they are recrawled.

Should I create an llms.txt for my site?

It is not mandatory, but it helps AI tools better understand your content. It is especially useful for SaaS products, documentation sites and APIs.

📖 Read the full guide: robots.txt and llms.txt — Guide to Crawling & AI Indexing →