LandKit

llms.txt Validator

Validate your llms.txt file. Make sure ChatGPT, Claude, and Perplexity can read your site correctly.

What is llms.txt?

llms.txt is a proposed open standard hosted at llmstxt.org that gives large language models a clean, Markdown-formatted summary of your website. It lives at /llms.txt and acts like a sitemap for AI engines.

The format is simple: an H1 with your site title, an optional blockquote summary, then ## sections containing bulleted links to your most important pages. AI engines can fetch this file in one request instead of trying to crawl and reason about your entire site.

Why your site needs llms.txt

  • 1

    Helps AI engines understand your site

    Instead of guessing from raw HTML, models get a curated map of your high-value pages with descriptions you wrote.

  • 2

    Ensures accurate citations

    When ChatGPT, Claude, or Perplexity cite your content, an llms.txt file makes them more likely to link to the right page with the right framing.

  • 3

    Future-proofs SEO for AI search

    AI search is replacing a growing share of traditional Google traffic. Sites with llms.txt files are positioned to be cited and surfaced as that shift accelerates.

  • 4

    Costs nothing to add

    A good llms.txt is a single Markdown file under 100 KB. There is no downside to having one, even if not every AI engine reads it yet.

Don't have an llms.txt yet?

Build a properly formatted llms.txt for your site in under a minute. Free, no signup required.

Open Generator
Deep dive

llms.txt validator: how to write, validate, and ship a working llms.txt in 2026

By Nikhil Kumar, Founder of LandKit. Last updated May 2026.

You wrote an llms.txt file, dropped it at the root of your site, and watched nothing happen.

I have rebuilt mine four times this year. Here is what I learned the hard way.

An llms.txt validator is a parser that checks your llms.txt file against the llmstxt.org spec Jeremy Howard published on September 3, 2024. A valid file uses one H1 site name, one blockquote summary, optional Markdown context, and H2-delimited link sections in [name](url): description format. The validator flags spec violations that quietly break AI ingestion in ChatGPT, Claude, Perplexity, and dev tools like Cursor and Aider.

What an llms.txt file actually is in 2026

An llms.txt file is a plain Markdown file at https://yoursite.com/llms.txt that gives an AI agent a curated, low-noise map of the URLs it should read on your site. Jeremy Howard of Answer.AI proposed it in September 2024 because LLM context windows are too small to ingest a typical site's HTML. The file replaces "crawl my whole site" with "here are the 30 pages that matter, in order, with one-sentence descriptions."

Think of robots.txt as the gatekeeper and llms.txt as the concierge.

Robots.txt says yes or no to crawling. Llms.txt says here is what is worth reading first.

The file has two practical jobs. It tells human-driven agents (Cursor, Claude Code, Perplexity in deep-research mode) where to start. And it tells your team what counts as canonical content.

The format is deliberately Markdown, not XML or JSON. Howard's spec notes that "we expect many of these files to be read by language models and agents," and Markdown is the format LLMs handle most reliably.

Does llms.txt actually do anything, or is it dead

The honest answer in May 2026 is: ChatGPT, Claude, and Perplexity do not crawl /llms.txt automatically, but they all read it on demand and dev tools read it by default. Search Engine Land's October 2025 test found that GPTBot, PerplexityBot, ClaudeBot, and Google-Extended logged zero hits to /llms.txt between mid-August and late October 2025, per Semrush's analysis. SE Ranking's November 2025 study of 300,000 domains found no correlation between llms.txt presence and AI citations.

So why bother writing one?

Because the on-demand path is now significant. When a user asks ChatGPT or Claude "summarize the docs at landkit.pro," the model fetches /llms.txt if it exists and uses it as the table of contents. Cursor, Aider, Continue, and the entire MCP-style agent stack read it by default. Mintlify rolled out auto-generated llms.txt for every docs site it hosts in November 2024, which gave Anthropic, Cursor, Coinbase, and Pinecone the file overnight.

Here is the trap. AI search visibility tools like Semrush flag missing /llms.txt as a critical audit issue. That is what Search Engine Journal called a misinformation loop in September 2025. There is no penalty for missing the file. There is also no traffic for adding a broken one.

The ROI math is simple. The file takes 15 minutes to write and validate. It pays off only when an agent reads it. If you publish docs, an SDK, an API, or a content-heavy SaaS where buyers research with Claude or ChatGPT, write the file. If you run a 5-page lawyer site, skip it.

What the llmstxt.org spec actually requires

The llmstxt.org spec is short, and most validation failures come from ignoring three sentences in it. The required structure is one H1 with the project or site name, one blockquote summary that gives the LLM enough context to read the rest of the file, optional non-heading Markdown for additional details, then zero or more H2-delimited "file lists" of [name](url): description Markdown links. Anything else is invalid.

The spec is documented in full at llmstxt.org. Read it once.

Here is the minimum viable file I ship for new LandKit pages:

# LandKit

> LandKit is an SEO and AI visibility growth OS that tracks brand mentions across ChatGPT, Claude, Gemini, and Perplexity. The pages below cover the product, free tools, pricing, and the technical SEO playbook.

## Core pages

- [Homepage](https://landkit.pro/): What LandKit does and who it is for
- [Pricing](https://landkit.pro/pricing/): Single $79/month Growth OS plan, $49 with code FOUNDER49
- [Free SEO audit](https://landkit.pro/seo-audit-tool/): On-page audit with no signup

## Free tools

- [llms.txt generator](https://landkit.pro/llm-text-generator/): Auto-build a spec-compliant file
- [Schema validator](https://landkit.pro/free-tools/schema-validator/): Test JSON-LD before shipping
- [AI crawler reference](https://landkit.pro/free-tools/ai-crawler-reference/): What every AI bot looks for

## Optional

- [Free tools hub](https://landkit.pro/free-tools/): All 19 tools in one place

That file passes every validator I have tested. It is also short on purpose. Long files trigger the second-biggest mistake: pasting in every URL on the site and producing a 4,000-line text dump with no editorial signal. The whole point is curation.

How to validate your llms.txt file in three checks

A complete llms.txt validation has three layers: spec parsing, link integrity, and AI readability. A spec-only validator catches H1, blockquote, and link-format violations but misses 404 links and confusing prose summaries. A complete check parses every URL with a HEAD request, reads the blockquote with a small LLM to score clarity, and confirms the file is served at the root with the right content type.

The fastest local check is curl -I https://yoursite.com/llms.txt. If that returns anything other than 200 OK and content-type: text/plain or text/markdown, your validator will fail every downstream check.

I run three layers in this order:

  1. Fetch the file at /llms.txt and confirm a 200 with text/Markdown content type.
  2. Parse the structure: exactly one H1, one blockquote inside the first 200 characters of the body, H2 sections only for link lists.
  3. Crawl every URL in the file and confirm a 200 (or a 301 to a 200). Dead links are the silent killer because agents will follow them, hit a 404, and discard the entire chunk.

Most of the public validators I tested catch layers one and two. Almost none catch layer three.

The 10 mistakes that silently block AI citations

Most llms.txt files fail validation for the same handful of reasons, and the failures are not loud. The validator does not crash. The agent quietly returns "I could not find authoritative information from your site," and you assume your content is the problem when it is the file. The fix list below covers more than 90% of validation failures I see in client audits.

Here are the ten failure modes ranked by how often I find them:

RankMistakeWhat breaks
1Multiple H1 headingsSpec violation; parser stops at first H1 and ignores everything after
2Missing blockquote summaryAgent has no context for the URL list and may skip the file
3Wrong link format- url - description instead of - [name](url): description is unparseable
4H3 or H4 inside link sectionsSpec only allows H2 to delimit link lists
5404 or redirected URLsAgent fetches the URL, gets a dead page, drops the citation
6File served as text/htmlLLMs expect text/plain or text/markdown; some clients reject text/html
7File at /docs/llms.txt or /public/llms.txtSpec requires root path /llms.txt only
8Description longer than one sentenceAuto-summarizers truncate and lose the keyword
9Duplicate URLs across sectionsWastes the agent's context window and dilutes signal
10Listing every URL on the siteRemoves editorial signal; the file becomes a sitemap

The fifth one is the most expensive. I once spent two weeks wondering why a client's docs were not being cited by Claude, and the issue was three dead URLs in the top section of their llms.txt. The agent was fetching the file, hitting the dead links first, and giving up before reading the live ones.

Run a link checker. Every time you ship the file. No exceptions.

How do I write an llms.txt file that ChatGPT and Perplexity actually use

The shortest llms.txt file that earns lift on ChatGPT, Perplexity, and Claude is roughly 30 to 80 URLs grouped into 3 to 5 sections, each link with a one-sentence description that names the entity, the format, and the use case. Length is not the win. Curation is. A file with 40 well-described URLs gets cited more than one with 400 untyped URLs because LLMs use the descriptions as retrieval anchors when they decide which page to fetch next.

Three rules I follow on every file:

Rule one: write descriptions for the agent, not the human. Instead of "Pricing" write "Pricing page covering the single $79 Growth OS plan, the $49 founder discount, and the free tier." The named entities ($79, FOUNDER49, free tier) are what the LLM keys on at retrieval time.

Rule two: lead with the most-cited URL types. Comparison pages, glossary pages, pricing, free tools, and benchmark studies are the formats AI engines lift most often, per Search Engine Journal's 2025 citation study. Put them in section one.

Rule three: use the "Optional" section for second-tier content. The spec says agents can skip URLs in the Optional section if they need a shorter context. That gives you an explicit mechanism to mark "important" versus "supporting." Most files I audit do not use it.

If you want a starting point, run our free llms.txt generator on any site URL and edit from there.

What is the difference between llms.txt and llms-full.txt

The llms.txt file is a curated index of URLs in under 10 KB; llms-full.txt is the entire content of those URLs concatenated into a single Markdown file, often 200 KB to 5 MB. Llms.txt tells the agent where to look. Llms-full.txt is a one-shot context dump for agents that cannot or will not crawl. Mintlify auto-generates both for every docs site it hosts as of November 2024, which is how Anthropic, Cursor, Coinbase, and Pinecone got both files overnight.

The llms-full.txt file is not in the original spec. It emerged organically because dev tool agents like Aider and Continue hit context-window limits when crawling docs page by page.

When to ship which:

Ship llms.txt if your content is over ~50 KB total. Most marketing sites and SaaS docs.

Ship both if you are running a docs site under ~500 KB total and your audience uses Cursor, Continue, or Claude Code. The llms-full.txt cuts ingestion time roughly in half because the agent makes one request instead of 30.

Ship llms-full.txt only if your full corpus fits in a 200K context window after Markdown stripping, which is rare outside docs.

For a deeper breakdown of how to set up the related crawler files, our AI crawler robots generator handles the robots.txt and ai.txt side of the same problem.

How do I know if my llms.txt is being read

The cleanest signal is server log analysis filtered for User-Agent strings containing GPTBot, PerplexityBot, ClaudeBot, Google-Extended, OAI-SearchBot, and Cohere-AI; a working llms.txt sees occasional GET requests to /llms.txt from these bots, plus burst requests when a user prompts an agent to read your site. Per Search Engine Land's October 2025 test, unprompted automatic crawling of llms.txt is rare today; on-demand fetches when users ask agents to read your site are the dominant signal.

Three things to look for in logs:

GET requests to /llms.txt from named AI user agents (the bot is fetching your file).

GET requests to URLs listed in /llms.txt from those same user agents within a few seconds (the bot is following the curated path you defined).

Citations in your AI mention monitoring tool that map to URLs explicitly listed in your llms.txt, especially the top section.

If you are tracking AI citations across ChatGPT, Claude, Gemini, and Perplexity, LandKit does this automatically. If you are not, grep your access logs for the user agents above and watch them for two weeks.

llms.txt validator vs schema validator vs robots.txt: which do I run first

Run robots.txt first because a blocked AI bot never reaches llms.txt; run schema validator second because schema fixes appear in citations within days and llms.txt fixes appear in weeks; run llms.txt validator third as the curation layer that earns lift only when the first two are clean. All three pass before any single one of them moves AI citations measurably. Order matters because each builds on the previous.

FileWhat it controlsValidation focusTime to impact
robots.txtWhether AI bots can crawl at allAllow/disallow rules for GPTBot, ClaudeBot, PerplexityBot, Google-ExtendedImmediate
Schema (JSON-LD)How AI engines understand your entities, products, FAQs, articlesValid types, required fields, no syntax errorsDays to weeks
llms.txtWhich URLs to read first when an agent is askedOne H1, one blockquote, H2 sections, valid Markdown linksWeeks to months
llms-full.txtThe full content corpus in one shotSame Markdown rules, plus content freshnessSame as llms.txt

If schema is broken, AI engines cannot extract clean entity data. If robots.txt blocks AI bots, none of the others matter. Both should be clean before you spend time on llms.txt. Our schema validator covers the second layer.

Frequently asked questions

Is llms.txt actually used by ChatGPT or Claude in 2026

ChatGPT and Claude do not crawl /llms.txt automatically as of May 2026, but both fetch it on demand when a user prompts them with "read the docs at <site>" or similar. Search Engine Land's October 2025 server log study found zero unprompted hits from GPTBot, ClaudeBot, or PerplexityBot to a test site's llms.txt over ten weeks. Dev tool agents (Cursor, Aider, Claude Code) read it by default, which is the bigger near-term audience for most SaaS.

What is the correct file path for llms.txt

The llms.txt file must be served at the root of your domain at https://yoursite.com/llms.txt, not at /docs/llms.txt, /public/llms.txt, or any subpath. The llmstxt.org spec treats the root path as the only discoverable location, the same way robots.txt only works at /robots.txt. If you host docs on a subdomain, publish a separate llms.txt at the root of that subdomain (e.g., docs.yoursite.com/llms.txt).

Do I need an llms.txt if I have a sitemap.xml

A sitemap.xml lists every URL on your site for search engines to crawl exhaustively, while llms.txt curates the 30 to 80 URLs you want an AI agent to read first with one-sentence descriptions. They solve different problems and live alongside each other. Sitemap.xml is for Google's classic crawler. Llms.txt is for an LLM with a 200K-token context window that needs editorial signal, not a 50,000-URL dump.

What is the maximum size of an llms.txt file

The llmstxt.org spec does not set a maximum size, but most validators flag files over 100 KB as likely sitemap dumps rather than curated indexes. In practice, keep llms.txt under 10 KB and put the deep content in a separate llms-full.txt file. Mintlify's auto-generated files typically run under 8 KB for llms.txt and 200 KB to 2 MB for llms-full.txt across the docs sites they host.

How often should I update llms.txt

Update llms.txt every time you publish a major new page, deprecate a page, or restructure pricing or product categories, which for most SaaS is monthly to quarterly. Stale URLs in the top section are the most expensive validation failure because agents hit dead links and discard the file. Set a calendar reminder for the first of every month and re-run the validator. If you ship a new pricing page or kill a feature, update the file the same day.

Will adding llms.txt hurt my Google rankings

Adding a valid llms.txt file at /llms.txt does not affect Google rankings; it is not a signal Google uses, and Google's John Mueller publicly compared it to the discredited keywords meta tag in 2025. The file is text/Markdown, served outside the indexable HTML surface, and ignored by Googlebot. The only ranking risk is publishing a sloppy file with broken links that wastes crawl budget, which is a corner case for most sites.

Ship the file, then validate it monthly

Write a 30-line llms.txt this week. Run it through a validator. Fix any spec failures the same day. Then put a calendar reminder on the first of every month to re-run the link check, because the only thing worse than no llms.txt is one with three dead URLs at the top.

If you want LandKit to track which AI engines actually cite your URLs after you ship the file, start a free LandKit scan or check the free-tools hub for the related schema validator and AI crawler reference.

About the author

Nikhil Kumar is the founder of LandKit, an SEO and AI visibility growth OS that tracks brand mentions across ChatGPT, Claude, Gemini, and Perplexity for solo founders and lean SaaS teams. He built LandKit after spending two years auditing why technically clean sites still failed to earn AI citations. Connect on LinkedIn.