What is Schema.org — and why does it matter for AI search?
AI search demands machine-readable content.
What Schema.org actually is
Schema.org is a shared vocabulary for structured data on the web. It was initiated in 2011 by Google, Microsoft, Yahoo, and Yandex — the four major search engines — to establish a common language for describing the meaning of web content.
Today, Schema.org defines 806 types and over 1,400 properties for describing entities: organizations, products, people, articles, events, reviews, recipes, local businesses, and much more.
When implemented on a web page, Schema.org markup tells machines: this is a product, its price is X, it has these reviews, it is made by this organization. Not as unstructured text that requires interpretation — but as unambiguous, machine-readable data.
The recommended implementation format is JSON-LD (JavaScript Object Notation for Linked Data) — a structured data block in the HTML that is invisible to human visitors but fully readable by search engines and AI crawlers.
How structured data affects AI visibility.
Discovery
Understanding
Grounding
806 TYPES.
1,400+ PROPERTIES.
ONE VOCABULARY.
Why most sites still don't have proper Schema markup.
Implementation belongs to development. Content belongs to editorial. Quality assurance belongs to SEO. Translation is separate. Where does schema fit? In practice: nowhere, because everyone assumes someone else is handling it.
Add to that the feedback problem. With SEO, you see rankings. With ads, you see conversions. With structured data — nothing visible. No score, no dashboard, no immediate reward. Without visible feedback, it's nearly impossible to justify budget or attention.
The result: even companies that understand the value of structured data rarely implement it properly at scale.
Why simple LLM generation
isn't enough.
Generating schema with an AI model out-of-the-box seems like an obvious shortcut. It isn't.
Peer-reviewed research from the University of Nantes shows that 40–50% of LLM-generated markup without a curation pipeline is invalid, non-factual, or non-compliant.
Only after a structured multi-step curation pipeline — checking validity, factuality, and compliance independently — does LLM-generated markup reach enterprise quality standards, outperforming human annotators in benchmark tests.
Off-the-shelf AI generation produces markup that looks correct. A proper pipeline ensures it is correct.
What types matter most.
| Foundation | Organization · WebSite · BreadcrumbList · WebPage — every site needs these |
| Content | Article / NewsArticle · FAQPage · HowTo — publishers and content sites |
| Commerce | Product · Offer · AggregateRating · Review — e-commerce and online shops |
| Local | LocalBusiness · PostalAddress · OpeningHoursSpecification — local businesses |
| Media | ImageObject · VideoObject · Event · Person — rich media and team pages |
Schema.org defines 806 types — from Article and Product to MedicalCondition and Event. Each type comes with a set of properties: some required, meaning search engines expect them for basic eligibility, and some recommended, meaning they unlock richer results and stronger signals for both Google and AI search systems. A Product type alone can carry dozens of properties — name, offers, aggregateRating, availability, brand, image, sku, and more. The difference between a page that qualifies for a Rich Result and one that doesn't often comes down to a single missing recommended property. This is why the number of types supported is the wrong metric entirely. What actually determines your visibility is how many properties are correctly populated — and kept accurate as your content changes.
Full vocabulary. Full automation.
No content changes. No backend work. Markup updates automatically when content changes.
// Key takeaways
Schema.org is a shared vocabulary initiated by Google, Microsoft, Yahoo, and Yandex in 2011. It defines 806 types and 1,400+ properties. JSON-LD is the recommended format. Structured data affects AI visibility at three levels: Discovery, Understanding, and Grounding. 40–50% of LLM-generated markup without a curation pipeline is invalid (Dang et al., 2025). The implementation problem is organizational, not technical — which is why automation matters.