Schema markup generator: how to win rich results and AI citations with JSON-LD in 2026
By Nikhil Kumar, Founder of LandKit. Last updated May 2026.
Most teams ship schema that validates and gets ignored. Their JSON-LD passes the Rich Results Test, then disappears into the index without earning a single rich result or AI citation.
A schema markup generator only earns its keep if the JSON-LD it spits out maps to a schema type Google still rewards in 2026, sits in the right place in the document, and survives the gap between "valid" and "trusted." Across Semrush's 10-million-keyword study, AI Overviews now appear in 15.69% of November 2025 queries, and FAQ-marked pages are 3.2x more likely to be lifted into them. The schema types that move the needle in 2026 are Article, Product, Organization, Person, Breadcrumb, and structurally honest FAQPage. Almost everything else is decoration.
Which schema types actually move the needle in 2026
The 2026 winners are Article, Product (with Offers and AggregateRating), Organization, Breadcrumb, FAQPage, and Person. These are the types Google still issues rich results for, and the types AI engines extract entities from most aggressively. According to Frase's 2025 analysis, only 12.4% of websites use any structured data at all, so the ceiling on competitive lift is still wide open for sites that ship the right six types correctly.
Google's supported rich result list covers about 30 features in 2026, but most of them only fire for very specific verticals (Recipe, Job Posting, Vacation Rental, Event).
For a typical SaaS, agency, or solo-founder site, the working set is much smaller.
Here is the matrix I run before generating any JSON-LD, with the rich-result outcome and the AI-citation outcome separated. AI citation lift figures are pulled from BrightEdge's 2025 AI Overviews research and Frase's FAQ-schema analysis.
| Schema type | Google rich result in 2026 | AI engine citation lift | When to use it |
|---|---|---|---|
| Article / BlogPosting | Headline + image carousel, Top Stories | High; cited in 33% of comparative AI listicle answers | Every blog post, news piece, or guide |
| Product + Offers + AggregateRating | Star ratings, price, availability | Medium; high in product comparison prompts | Product, pricing, and software-tool pages only |
| Organization | Knowledge panel, sitelinks search box | Decisive entity grounding for ChatGPT | Homepage and about page, once per site |
| Breadcrumb | Breadcrumb display in SERP | Helps AI understand site hierarchy | Every internal page |
| FAQPage | None for most sites since August 2023 | 3.2x more likely in AI Overviews per Frase | Real Q&A sections only, never invented |
| Person | None directly, supports E-E-A-T | Strong for author-disambiguation in Claude | Author bios, founder pages |
| HowTo | Removed September 2024 | Still parsed by AI engines | Skip unless content is genuinely procedural |
| Speakable | None for most sites | Negligible | Skip |
The two traps inside this list are FAQPage and HowTo.
Both lost rich-result eligibility (FAQ in August 2023, HowTo fully in September 2024), and both got dismissed as dead by half the SEO industry. That dismissal was wrong. The schema still feeds AI engines even when Google's blue-link surface ignores it. Frase's 2025 study found pages with FAQPage schema get cited 3.2x more often in AI Overviews than equivalent pages without.
How does JSON-LD actually affect ChatGPT and Perplexity citations?
JSON-LD does not directly cause AI citations the way a backlink might. What it does is help the LLM identify the entity, the author, and the publication date with low ambiguity, which raises the probability that the chunk gets selected during retrieval and reranking. SearchVIU's October 2025 testing confirmed that ChatGPT, Claude, Perplexity, and Gemini all actively process Schema markup when accessing content. Sites running structured data plus a real FAQ block saw a 44% lift in AI search citations in BrightEdge's 2025 study.
The mechanism is simple.
LLMs embed your chunks into vector space, retrieve the chunks closest to the buyer's prompt, then have a cross-encoder reranker pick which to actually quote. Embedding cares about semantic clarity. Retrieval cares about entity grounding. Reranking cares about evidence density and authority. JSON-LD touches all three.
That is why the same article will sometimes get cited by Perplexity but not ChatGPT.
A 2025 Profound benchmark found only 11% of domains are cited by both ChatGPT and Perplexity, with Wikipedia accounting for 47.9% of ChatGPT citations and Reddit driving 46.7% of Perplexity citations. ChatGPT trusts Wikipedia-shaped entities, which is why Organization and Person schema with strong sameAs arrays get pulled hardest.
The pages that get cited most across all four engines pair Article schema with a Person schema for the author and an Organization schema with sameAs arrays pointing to LinkedIn, Wikidata, Crunchbase, and X. That is the entity grounding LLMs need to confidently attribute the chunk back to you.
Where should I actually put the JSON-LD in my document?
Inside a single <script type="application/ld+json"> block in the <head> of the page, with all schema types combined into one @graph array. Google's structured data documentation says JSON-LD can sit in either the head or the body, but the head is faster to parse and avoids any chance the LLM hits a tokenization boundary before reading it. SchemaValidator.org's 2025 telemetry found sites using JSON-LD see 23% fewer structured data errors in Search Console than sites using Microdata.
Three placement rules, in order of importance.
First, never split the same entity across two script tags. AI parsers and Googlebot both struggle when one Article node lives in one <script> and the related Person node lives in another. Combine everything into one @graph.
Second, the JSON-LD must describe the page that contains it. Google's structured data policies say the markup must be "a true representation of the page content." Mark up an author who actually wrote the post, a price that actually appears on the page, a rating that matches what a user can see. Mismatch is the fastest way to earn a manual action.
Third, if you run a static-site generator or a React app, the JSON-LD must be in the prerendered HTML, not injected by client-side JavaScript. Googlebot will execute JS, but rerankers in ChatGPT and Perplexity often will not. If your schema only renders after hydration, you might pass the Rich Results Test and still lose every AI citation.
After you ship, re-run the LandKit schema validator plus Google's Rich Results Test against the live URL, and confirm the schema actually appears in the raw HTML response.
What does Google's Rich Results Test actually check, and what doesn't it?
The Rich Results Test only checks the schema types Google currently uses for rich results, and only flags whether your markup is eligible for those visual SERP features. It does not validate full schema.org compliance, it does not predict AI citation behavior, and it does not tell you whether your data passes Google's policy review. For broader validation, the schema.org Schema Markup Validator (jointly maintained by Google, Microsoft, and Yahoo) checks against the full schema.org vocabulary, not just the Google-supported subset.
Three things the Rich Results Test misses.
It misses anything outside Google's roughly 30-feature catalog. Speakable, most newer schema.org types, and many VideoObject configurations get flagged "no rich results," which is correct but misleading.
It misses entity-grounding quality. You can ship a Person schema with a sameAs array pointing to dead URLs and the test still passes. AI engines will struggle to disambiguate you.
It misses the policy layer. The Rich Results Test green-lights inflated AggregateRating counts, fake reviews, and self-serving reviews. Google's policies prohibit all three, and the manual-action team enforces separately.
The practical workflow I run on every page.
- Generate the JSON-LD.
- Validate in the schema.org validator first for structural issues across the full vocabulary.
- Run the Google Rich Results Test for the SERP-feature view.
- Deploy, fetch the live URL with
curl, and confirm the JSON-LD is in the prerendered HTML. - Re-run both validators against the live URL.
Is FAQ schema dead now that Google killed FAQ rich results?
FAQ schema is dead for SERP rich snippets on most sites, but very much alive as the strongest AI citation signal in 2026. Google restricted FAQ rich results to government and health sites in August 2023, then quietly tightened that further. AI engines went the opposite direction. Frase's 2025 study found pages with FAQPage schema are 3.2x more likely to appear in Google AI Overviews, and BrightEdge's research showed sites adding structured data plus FAQ blocks earned 44% more AI citations.
The trap I see most often: teams gut their FAQ schema in 2024 because Google killed the rich result, and their AI citations crater six months later without anyone connecting the dots. Keep the schema.
The rules to keep it earning citations rather than penalties.
The questions in your FAQPage schema have to match real questions on the visible page. You cannot generate fake FAQs server-side and hide them behind a CSS display: none. Google's policy team treats that as hidden content and revokes rich-result eligibility across the entire site.
Mine real prompts. Reverse-engineer the People Also Ask box and the actual phrasing your buyers use on Reddit, Quora, and ChatGPT. Then write a real Q&A block that matches the schema.
The buyer-prompt mining I do for LandKit's free tools hub usually surfaces six to eight prompts per topic. Five make the FAQPage cut. The rest become H2s in the body. Pages with this discipline sit at the high end of the 67% FAQPage AI citation rate band that SearchVIU's October 2025 study reported.
How do I keep my schema from getting penalized?
Three rules avoid 95% of manual actions on schema. The schema must describe content visible on the page, the AggregateRating numbers must match what users actually see, and the Person and Organization sameAs arrays must point to live profiles you actually own. Google's general structured data guidelines explicitly call out hidden markup, fake reviews, and impersonation as manual-action triggers, and the spam team enforces them aggressively in 2025 and 2026.
The most common avoidable mistakes.
Marking up content that is not on the page. If your schema says the author is Jane Smith but the byline says John Doe, you fail.
Inflating review counts. If your code says 100 reviews and the page shows 95, that is a violation. The aggregateRating and reviewCount must exactly match the visible numbers.
Self-serving reviews. Google does not show review rich results for LocalBusiness and Organization schema when the entity being reviewed controls the reviews. A SaaS company cannot mark up its own customer testimonials as Review schema and expect rich results.
Stale dateModified values. AI engines, especially Perplexity, weight recency aggressively. If your dateModified is 2022, your chances of being cited in 2026 drop sharply.
Broken sameAs URLs. If your Person schema's sameAs array points to a LinkedIn profile that 404s, the entity-grounding signal collapses. Re-validate every six months.
A 2025 r/SEO thread flagged a related shortcut: people generate JSON-LD with ChatGPT or Grok, ship it without checking, and end up with valid-but-hallucinated properties, fake sameAs URLs, and the wrong @type. Validation passes, trust collapses. Start from a deterministic template, not an LLM, and only use the LLM for field values.
How do I roll out a schema markup generator across 50+ pages without losing my mind?
Pick four schema types, build a single shared JSON-LD template per type, and inject page-specific values from your CMS or build pipeline. Most teams that ship schema cleanly use 4-6 types total: one Article template, one Organization template on every page, one Breadcrumb generated from URL structure, and one Product or FAQPage where content supports it. Walker Sands' 2025 LLM-visibility analysis found this 4-template approach covers 80% of citation-driving pages.
The mistake I see at scale: treating each page as a snowflake. Engineers ship one-off JSON-LD per page with slightly different @id patterns, sameAs arrays, and image properties.
The result is entity drift. Six months in, your Organization schema looks like five different organizations to ChatGPT, and your knowledge graph footprint splinters.
The 5-step rollout I use.
- Audit current schema with the free LandKit schema validator across your top 50 URLs. Capture which schema types are present, which are valid, and which are merely valid.
- Pick the 4-6 schema types you will commit to. Article, Organization, Breadcrumb, Person are non-negotiable. Add Product or FAQPage if your content supports them honestly.
- Build one template per type, with
{{placeholder}}fields the build system fills in. - Audit the canonical URL on every page with the LandKit canonical tag checker before you ship. Schema with mismatched canonicals is the second-most-common reason rich results fail to display.
- After deploy, sample-check 10 random URLs in Google Rich Results Test. Expect at least 8 to show eligible features. If fewer, your templates have a structural bug.
This system scales to 5,000 pages without drift if you wire it into your build, and it costs roughly two days of engineering once. After that, every new page gets its schema for free.
Frequently asked questions
What's the best free schema markup generator that handles JSON-LD for FAQ, Product, Article, HowTo, and Organization?
A good schema markup generator should output schema.org-compliant JSON-LD, support at least Article, Product, FAQPage, HowTo, and Organization, and produce a single combined <script> block rather than fragmented blocks per type. Google's structured data documentation explicitly recommends JSON-LD as the preferred format because it separates data from HTML markup. Free tools like Merkle's TechnicalSEO.com generator, Saijo George's, and the LandKit schema markup generator all hit those bars.
Does Google still rank pages higher if they have schema markup in 2026?
Schema markup is not a direct Google ranking factor and never has been. What it does is help Google understand your content with low ambiguity, which raises the probability of rich results and improves click-through rates by 20-30% per BrightEdge's 2025 data. Indirectly, that drives ranking. As of January 2026, AI Overviews appear in roughly 25% of US searches, and pages with full schema are 3x more likely to be cited inside them per BrightEdge's research.
Will my JSON-LD work if I add it via Google Tag Manager instead of in the page source?
Technically yes, but you take a real risk. Googlebot will execute JavaScript and pick up the schema, but AI engines like ChatGPT, Perplexity, and Claude often fetch raw HTML without executing JS. If your schema only exists post-hydration, you may pass the Google Rich Results Test and still lose every AI citation. The safe path is to inject JSON-LD server-side or at build time, in the page's <head>, before any client-side hydration runs.
Do I need separate schema for every page on my site, or can I put it all in one global block?
Page-level entities (Article, Product, FAQPage) need to be unique per page because they describe that specific page's content. Site-level entities (Organization, Person, WebSite) should be defined once with a consistent @id and referenced from every page. Google's structured data documentation calls this entity referencing. Doing it consistently is the single biggest factor in whether ChatGPT correctly attributes citations to your brand instead of fragmenting your entity across the index.
Is there a penalty for using a schema markup generator that produces "valid but hallucinated" JSON-LD from ChatGPT or Grok?
Yes, indirectly. The JSON-LD will pass the Google Rich Results Test if the structure is right, but the policy team and AI engines both detect inflated review counts, fake authors, and dead sameAs URLs over time. Google's structured data policies treat hidden or misleading markup as a manual-action trigger, and AI engines will quietly stop citing your domain once their rerankers flag the pattern. Always verify field values, never just structure.
Should I add HowTo schema in 2026 even though Google killed the rich result in September 2024?
Yes, if the content is genuinely procedural. Google removed the visual rich result in September 2024 to clean up cluttered SERPs, but the structured data itself still feeds AI engines. ChatGPT and Perplexity actively parse HowTo schema for step-by-step prompts, which match how people actually use AI for "how do I X" queries. The schema costs almost nothing to ship, and the AI citation upside on procedural content is real. Skip it only if your content is not actually procedural.
Ship schema this week, not next quarter
Pick the three highest-traffic pages on your site. Generate Article, Organization, and Breadcrumb schema for them with a deterministic generator. Validate in both the schema.org validator and the Google Rich Results Test. Then check the live HTML response with curl to confirm the JSON-LD is in the prerendered source. That round trip takes under 30 minutes per page once your template is ready, and it puts you ahead of the 87% of websites that ship no structured data at all.
Once those three pages are live and indexed, add FAQPage to the next five pages where the content actually answers buyer questions, and watch the AI citations follow inside 30-60 days.
Nikhil Kumar is the founder of LandKit, the SEO and AI visibility growth OS that tracks brand mentions across ChatGPT, Claude, Gemini, and Perplexity. Before LandKit, he ran growth and SEO programs for SaaS companies and now helps solo founders and lean teams compete on visibility against better-funded incumbents. Connect on LinkedIn.