Free llms.txt Validator, Check AI Visibility File

llms.txt validator: how to write, validate, and ship a working llms.txt in 2026

By Nikhil Kumar, Founder of LandKit. Last updated May 2026.

You wrote an llms.txt file, dropped it at the root of your site, and watched nothing happen.

I have rebuilt mine four times this year. Here is what I learned the hard way.

An llms.txt validator is a parser that checks your llms.txt file against the llmstxt.org spec Jeremy Howard published on September 3, 2024. A valid file uses one H1 site name, one blockquote summary, optional Markdown context, and H2-delimited link sections in [name](url): description format. The validator flags spec violations that quietly break AI ingestion in ChatGPT, Claude, Perplexity, and dev tools like Cursor and Aider.

What an llms.txt file actually is in 2026

An llms.txt file is a plain Markdown file at https://yoursite.com/llms.txt that gives an AI agent a curated, low-noise map of the URLs it should read on your site. Jeremy Howard of Answer.AI proposed it in September 2024 because LLM context windows are too small to ingest a typical site's HTML. The file replaces "crawl my whole site" with "here are the 30 pages that matter, in order, with one-sentence descriptions."

Think of robots.txt as the gatekeeper and llms.txt as the concierge.

Robots.txt says yes or no to crawling. Llms.txt says here is what is worth reading first.

The file has two practical jobs. It tells human-driven agents (Cursor, Claude Code, Perplexity in deep-research mode) where to start. And it tells your team what counts as canonical content.

The format is deliberately Markdown, not XML or JSON. Howard's spec notes that "we expect many of these files to be read by language models and agents," and Markdown is the format LLMs handle most reliably.

Does llms.txt actually do anything, or is it dead

The honest answer in May 2026 is: ChatGPT, Claude, and Perplexity do not crawl /llms.txt automatically, but they all read it on demand and dev tools read it by default. Search Engine Land's October 2025 test found that GPTBot, PerplexityBot, ClaudeBot, and Google-Extended logged zero hits to /llms.txt between mid-August and late October 2025, per Semrush's analysis. SE Ranking's November 2025 study of 300,000 domains found no correlation between llms.txt presence and AI citations.

So why bother writing one?

Because the on-demand path is now significant. When a user asks ChatGPT or Claude "summarize the docs at landkit.pro," the model fetches /llms.txt if it exists and uses it as the table of contents. Cursor, Aider, Continue, and the entire MCP-style agent stack read it by default. Mintlify rolled out auto-generated llms.txt for every docs site it hosts in November 2024, which gave Anthropic, Cursor, Coinbase, and Pinecone the file overnight.

Here is the trap. AI search visibility tools like Semrush flag missing /llms.txt as a critical audit issue. That is what Search Engine Journal called a misinformation loop in September 2025. There is no penalty for missing the file. There is also no traffic for adding a broken one.

The ROI math is simple. The file takes 15 minutes to write and validate. It pays off only when an agent reads it. If you publish docs, an SDK, an API, or a content-heavy SaaS where buyers research with Claude or ChatGPT, write the file. If you run a 5-page lawyer site, skip it.

What the llmstxt.org spec actually requires

The llmstxt.org spec is short, and most validation failures come from ignoring three sentences in it. The required structure is one H1 with the project or site name, one blockquote summary that gives the LLM enough context to read the rest of the file, optional non-heading Markdown for additional details, then zero or more H2-delimited "file lists" of [name](url): description Markdown links. Anything else is invalid.

The spec is documented in full at llmstxt.org. Read it once.

Here is the minimum viable file I ship for new LandKit pages:

# LandKit

> LandKit is an SEO and AI visibility growth OS that tracks brand mentions across ChatGPT, Claude, Gemini, and Perplexity. The pages below cover the product, free tools, pricing, and the technical SEO playbook.

## Core pages

- [Homepage](https://landkit.pro/): What LandKit does and who it is for
- [Pricing](https://landkit.pro/pricing/): Single $79/month Growth OS plan, $49 with code FOUNDER49
- [Free SEO audit](https://landkit.pro/seo-audit-tool/): On-page audit with no signup

## Free tools

- [llms.txt generator](https://landkit.pro/llm-text-generator/): Auto-build a spec-compliant file
- [Schema validator](https://landkit.pro/free-tools/schema-validator/): Test JSON-LD before shipping
- [AI crawler reference](https://landkit.pro/free-tools/ai-crawler-reference/): What every AI bot looks for

## Optional

- [Free tools hub](https://landkit.pro/free-tools/): All 19 tools in one place

That file passes every validator I have tested. It is also short on purpose. Long files trigger the second-biggest mistake: pasting in every URL on the site and producing a 4,000-line text dump with no editorial signal. The whole point is curation.

How to validate your llms.txt file in three checks

A complete llms.txt validation has three layers: spec parsing, link integrity, and AI readability. A spec-only validator catches H1, blockquote, and link-format violations but misses 404 links and confusing prose summaries. A complete check parses every URL with a HEAD request, reads the blockquote with a small LLM to score clarity, and confirms the file is served at the root with the right content type.

The fastest local check is curl -I https://yoursite.com/llms.txt. If that returns anything other than 200 OK and content-type: text/plain or text/markdown, your validator will fail every downstream check.

I run three layers in this order:

Fetch the file at /llms.txt and confirm a 200 with text/Markdown content type.
Parse the structure: exactly one H1, one blockquote inside the first 200 characters of the body, H2 sections only for link lists.
Crawl every URL in the file and confirm a 200 (or a 301 to a 200). Dead links are the silent killer because agents will follow them, hit a 404, and discard the entire chunk.

Most of the public validators I tested catch layers one and two. Almost none catch layer three.

The 10 mistakes that silently block AI citations

Most llms.txt files fail validation for the same handful of reasons, and the failures are not loud. The validator does not crash. The agent quietly returns "I could not find authoritative information from your site," and you assume your content is the problem when it is the file. The fix list below covers more than 90% of validation failures I see in client audits.

Here are the ten failure modes ranked by how often I find them:

Rank	Mistake	What breaks
1	Multiple H1 headings	Spec violation; parser stops at first H1 and ignores everything after
2	Missing blockquote summary	Agent has no context for the URL list and may skip the file
3	Wrong link format	`- url - description` instead of `- [name](url): description` is unparseable
4	H3 or H4 inside link sections	Spec only allows H2 to delimit link lists
5	404 or redirected URLs	Agent fetches the URL, gets a dead page, drops the citation
6	File served as text/html	LLMs expect text/plain or text/markdown; some clients reject text/html
7	File at /docs/llms.txt or /public/llms.txt	Spec requires root path `/llms.txt` only
8	Description longer than one sentence	Auto-summarizers truncate and lose the keyword
9	Duplicate URLs across sections	Wastes the agent's context window and dilutes signal
10	Listing every URL on the site	Removes editorial signal; the file becomes a sitemap

The fifth one is the most expensive. I once spent two weeks wondering why a client's docs were not being cited by Claude, and the issue was three dead URLs in the top section of their llms.txt. The agent was fetching the file, hitting the dead links first, and giving up before reading the live ones.

Run a link checker. Every time you ship the file. No exceptions.

How do I write an llms.txt file that ChatGPT and Perplexity actually use

The shortest llms.txt file that earns lift on ChatGPT, Perplexity, and Claude is roughly 30 to 80 URLs grouped into 3 to 5 sections, each link with a one-sentence description that names the entity, the format, and the use case. Length is not the win. Curation is. A file with 40 well-described URLs gets cited more than one with 400 untyped URLs because LLMs use the descriptions as retrieval anchors when they decide which page to fetch next.

Three rules I follow on every file:

Rule one: write descriptions for the agent, not the human. Instead of "Pricing" write "Pricing page covering the single $79 Growth OS plan, the $49 founder discount, and the free tier." The named entities ($79, FOUNDER49, free tier) are what the LLM keys on at retrieval time.

Rule two: lead with the most-cited URL types. Comparison pages, glossary pages, pricing, free tools, and benchmark studies are the formats AI engines lift most often, per Search Engine Journal's 2025 citation study. Put them in section one.

Rule three: use the "Optional" section for second-tier content. The spec says agents can skip URLs in the Optional section if they need a shorter context. That gives you an explicit mechanism to mark "important" versus "supporting." Most files I audit do not use it.

If you want a starting point, run our free llms.txt generator on any site URL and edit from there.

What is the difference between llms.txt and llms-full.txt

The llms.txt file is a curated index of URLs in under 10 KB; llms-full.txt is the entire content of those URLs concatenated into a single Markdown file, often 200 KB to 5 MB. Llms.txt tells the agent where to look. Llms-full.txt is a one-shot context dump for agents that cannot or will not crawl. Mintlify auto-generates both for every docs site it hosts as of November 2024, which is how Anthropic, Cursor, Coinbase, and Pinecone got both files overnight.

The llms-full.txt file is not in the original spec. It emerged organically because dev tool agents like Aider and Continue hit context-window limits when crawling docs page by page.

When to ship which:

Ship llms.txt if your content is over ~50 KB total. Most marketing sites and SaaS docs.

Ship both if you are running a docs site under ~500 KB total and your audience uses Cursor, Continue, or Claude Code. The llms-full.txt cuts ingestion time roughly in half because the agent makes one request instead of 30.

Ship llms-full.txt only if your full corpus fits in a 200K context window after Markdown stripping, which is rare outside docs.

For a deeper breakdown of how to set up the related crawler files, our AI crawler robots generator handles the robots.txt and ai.txt side of the same problem.

How do I know if my llms.txt is being read

The cleanest signal is server log analysis filtered for User-Agent strings containing GPTBot, PerplexityBot, ClaudeBot, Google-Extended, OAI-SearchBot, and Cohere-AI; a working llms.txt sees occasional GET requests to /llms.txt from these bots, plus burst requests when a user prompts an agent to read your site. Per Search Engine Land's October 2025 test, unprompted automatic crawling of llms.txt is rare today; on-demand fetches when users ask agents to read your site are the dominant signal.

Three things to look for in logs:

GET requests to /llms.txt from named AI user agents (the bot is fetching your file).

GET requests to URLs listed in /llms.txt from those same user agents within a few seconds (the bot is following the curated path you defined).

Citations in your AI mention monitoring tool that map to URLs explicitly listed in your llms.txt, especially the top section.

If you are tracking AI citations across ChatGPT, Claude, Gemini, and Perplexity, LandKit does this automatically. If you are not, grep your access logs for the user agents above and watch them for two weeks.

llms.txt validator vs schema validator vs robots.txt: which do I run first

Run robots.txt first because a blocked AI bot never reaches llms.txt; run schema validator second because schema fixes appear in citations within days and llms.txt fixes appear in weeks; run llms.txt validator third as the curation layer that earns lift only when the first two are clean. All three pass before any single one of them moves AI citations measurably. Order matters because each builds on the previous.

File	What it controls	Validation focus	Time to impact
robots.txt	Whether AI bots can crawl at all	Allow/disallow rules for GPTBot, ClaudeBot, PerplexityBot, Google-Extended	Immediate
Schema (JSON-LD)	How AI engines understand your entities, products, FAQs, articles	Valid types, required fields, no syntax errors	Days to weeks
llms.txt	Which URLs to read first when an agent is asked	One H1, one blockquote, H2 sections, valid Markdown links	Weeks to months
llms-full.txt	The full content corpus in one shot	Same Markdown rules, plus content freshness	Same as llms.txt

If schema is broken, AI engines cannot extract clean entity data. If robots.txt blocks AI bots, none of the others matter. Both should be clean before you spend time on llms.txt. Our schema validator covers the second layer.

Frequently asked questions

Is llms.txt actually used by ChatGPT or Claude in 2026

ChatGPT and Claude do not crawl /llms.txt automatically as of May 2026, but both fetch it on demand when a user prompts them with "read the docs at <site>" or similar. Search Engine Land's October 2025 server log study found zero unprompted hits from GPTBot, ClaudeBot, or PerplexityBot to a test site's llms.txt over ten weeks. Dev tool agents (Cursor, Aider, Claude Code) read it by default, which is the bigger near-term audience for most SaaS.

What is the correct file path for llms.txt

The llms.txt file must be served at the root of your domain at https://yoursite.com/llms.txt, not at /docs/llms.txt, /public/llms.txt, or any subpath. The llmstxt.org spec treats the root path as the only discoverable location, the same way robots.txt only works at /robots.txt. If you host docs on a subdomain, publish a separate llms.txt at the root of that subdomain (e.g., docs.yoursite.com/llms.txt).

Do I need an llms.txt if I have a sitemap.xml

A sitemap.xml lists every URL on your site for search engines to crawl exhaustively, while llms.txt curates the 30 to 80 URLs you want an AI agent to read first with one-sentence descriptions. They solve different problems and live alongside each other. Sitemap.xml is for Google's classic crawler. Llms.txt is for an LLM with a 200K-token context window that needs editorial signal, not a 50,000-URL dump.

What is the maximum size of an llms.txt file

The llmstxt.org spec does not set a maximum size, but most validators flag files over 100 KB as likely sitemap dumps rather than curated indexes. In practice, keep llms.txt under 10 KB and put the deep content in a separate llms-full.txt file. Mintlify's auto-generated files typically run under 8 KB for llms.txt and 200 KB to 2 MB for llms-full.txt across the docs sites they host.

How often should I update llms.txt

Update llms.txt every time you publish a major new page, deprecate a page, or restructure pricing or product categories, which for most SaaS is monthly to quarterly. Stale URLs in the top section are the most expensive validation failure because agents hit dead links and discard the file. Set a calendar reminder for the first of every month and re-run the validator. If you ship a new pricing page or kill a feature, update the file the same day.

Will adding llms.txt hurt my Google rankings

Adding a valid llms.txt file at /llms.txt does not affect Google rankings; it is not a signal Google uses, and Google's John Mueller publicly compared it to the discredited keywords meta tag in 2025. The file is text/Markdown, served outside the indexable HTML surface, and ignored by Googlebot. The only ranking risk is publishing a sloppy file with broken links that wastes crawl budget, which is a corner case for most sites.

Ship the file, then validate it monthly

Write a 30-line llms.txt this week. Run it through a validator. Fix any spec failures the same day. Then put a calendar reminder on the first of every month to re-run the link check, because the only thing worse than no llms.txt is one with three dead URLs at the top.

If you want LandKit to track which AI engines actually cite your URLs after you ship the file, start a free LandKit scan or check the free-tools hub for the related schema validator and AI crawler reference.

About the author

Nikhil Kumar is the founder of LandKit, an SEO and AI visibility growth OS that tracks brand mentions across ChatGPT, Claude, Gemini, and Perplexity for solo founders and lean SaaS teams. He built LandKit after spending two years auditing why technically clean sites still failed to earn AI citations. Connect on LinkedIn.

llms.txt Validator

What is llms.txt?

Why your site needs llms.txt

Don't have an llms.txt yet?