Robots & LLMs Checker
Analyze a domain's robots.txt and llms.txt — crawling rules, sitemaps and AI indexing.
What is robots.txt?
The robots.txt file is a plain-text file at the root of a domain that gives instructions to web crawlers (Googlebot, Bingbot, etc.) about which pages they are allowed to index. It follows the Robots Exclusion Standard protocol and is the primary way to control crawler access to your site.
Core robots.txt directives
User-agent
Specifies which crawler the rule group applies to. * means "all crawlers". You can have separate groups for each bot (e.g. Googlebot, GPTBot).
Disallow / Allow
Disallow: /admin/ blocks the path from being indexed. Allow: / grants explicit access. An empty Disallow: means "the entire site is allowed". Disallow: / blocks the entire site.
Sitemap
The Sitemap: directive tells crawlers where to find the site's XML sitemap, making page discovery easier.
Crawl-delay
Asks the crawler to wait N seconds between requests. Not supported by Googlebot, but respected by many other crawlers.
What is llms.txt?
The llms.txt file is an emerging standard (llmstxt.org) that allows site owners to provide structured information about their content to Large Language Models (LLMs) and AI agents. It follows Markdown format and describes what the site does, which pages are important and what an AI model should know.
llms-full.txt
The extended version of llms.txt with the full content of pages — used by AI tools that want a deeper understanding of the site.