Learn

What is Schema.org — and why does it matter for AI search?

Schema.org markup is the technical layer that tells AI systems what your content means — not just what it says. Without it, even the best content is ambiguous to machines.

Get started → How enhancely works

// The new reality

AI search demands machine-readable content.

What Schema.org actually is

Schema.org is a shared vocabulary for structured data on the web. It was initiated in 2011 by Google, Microsoft, Yahoo, and Yandex — the four major search engines — to establish a common language for describing the meaning of web content.

Today, Schema.org defines 823 types and over 1,500 properties for describing entities: organizations, products, people, articles, events, reviews, recipes, local businesses, and much more.

When implemented on a web page, Schema.org markup tells machines: this is a product, its price is X, it has these reviews, it is made by this organization. Not as unstructured text that requires interpretation — but as unambiguous, machine-readable data.

The recommended implementation format is JSON-LD (JavaScript Object Notation for Linked Data) — a structured data block in the HTML that is invisible to human visitors but fully readable by search engines and AI crawlers.

// Three levels

How structured data affects AI visibility.

01

Discovery

AI search systems use a two-stage process: first retrieve relevant documents, then generate a response. Structured data supports retrieval by clearly classifying page content and signaling relevance. Research from KDD '24 shows that structured, fact-rich content can increase visibility in AI-generated responses by up to 40%.

02

Understanding

Schema markup for organizations, products, and people helps AI systems unambiguously identify entities and connect them to their broader world knowledge. A page that clearly marks up entity relationships — product → brand → manufacturer → review → rating — gives AI systems everything they need.

03

Grounding

AI systems use structured data as ground truth for fact verification. When an LLM needs to state a price, a date, a location, or an author, structured data provides a verifiable anchor. Pages without it require the AI to guess — and guesses reduce citation likelihood.

823 TYPES.
1,500+ PROPERTIES.
ONE VOCABULARY.

initiated by Google, Microsoft, Yahoo, and Yandex in 2011

SD

// The ownership problem

Why most sites still don't have proper Schema markup.

Schema.org has been around since 2011. Most sites still have poor or partial coverage. The reason isn't technical — it's organizational.

Implementation belongs to development. Content belongs to editorial. Quality assurance belongs to SEO. Translation is separate. Where does schema fit? In practice: nowhere, because everyone assumes someone else is handling it.

Add to that the feedback problem. With SEO, you see rankings. With ads, you see conversions. With structured data — nothing visible. No score, no dashboard, no immediate reward. Without visible feedback, it's nearly impossible to justify budget or attention.

The result: even companies that understand the value of structured data rarely implement it properly at scale.

// The accuracy problem

Why simple LLM generation
isn't enough.

Generating schema with an AI model out-of-the-box seems like an obvious shortcut. It isn't.

Peer-reviewed research from the University of Nantes shows that 40–50% of LLM-generated markup without a curation pipeline is invalid, non-factual, or non-compliant.

Only after a structured multi-step curation pipeline — checking validity, factuality, and compliance independently — does LLM-generated markup reach enterprise quality standards, outperforming human annotators in benchmark tests.

Off-the-shelf AI generation produces markup that looks correct. A proper pipeline ensures it is correct.

Dang et al. — "LLM4Schema.org" · Semantic Web Journal, IOS Press, 2025 · University of Nantes / LS2N

40–50%

of uncurated LLM markup fails validity, factuality or compliance

0.707

MIMR score — curated GPT-4 outperforms human annotators

3

validation layers required for enterprise quality

0

hallucinated values with proper curation pipeline

// Schema types

What types matter most.

Foundation	Organization · WebSite · BreadcrumbList · WebPage — every site needs these
Content	Article / NewsArticle · FAQPage · HowTo — publishers and content sites
Commerce	Product · Offer · AggregateRating · Review — e-commerce and online shops
Local	LocalBusiness · PostalAddress · OpeningHoursSpecification — local businesses
Media	ImageObject · VideoObject · Event · Person — rich media and team pages

Schema.org defines 823 types — from Article and Product to MedicalCondition and Event. Each type comes with a set of properties: some required, meaning search engines expect them for basic eligibility, and some recommended, meaning they unlock richer results and stronger signals for both Google and AI search systems. A Product type alone can carry dozens of properties — name, offers, aggregateRating, availability, brand, image, sku, and more. The difference between a page that qualifies for a Rich Result and one that doesn't often comes down to a single missing recommended property. This is why the number of types supported is the wrong metric entirely. What actually determines your visibility is how many properties are correctly populated — and kept accurate as your content changes.

// enhancely

Full vocabulary. Full automation.

enhancely generates markup across the full Schema.org vocabulary — all 823 types. Every piece of markup passes through a 3-layer curation pipeline before deployment: Validity (spec compliance), Factuality (grounded in your actual content), Compliance (meets Google, Bing, and AI search guidelines).

No content changes. No backend work. Markup updates automatically when content changes.

// Key takeaways

Schema.org is a shared vocabulary initiated by Google, Microsoft, Yahoo, and Yandex in 2011. It defines 823 types and 1,400+ properties. JSON-LD is the recommended format. Structured data affects AI visibility at three levels: Discovery, Understanding, and Grounding. 40–50% of LLM-generated markup without a curation pipeline is invalid (Dang et al., 2025). The implementation problem is organizational, not technical — which is why automation matters.

// FAQ

Schema.org
FAQ.

Schema.org is a shared vocabulary for structured data on the web, initiated in 2011 by Google, Microsoft (Bing), Yahoo, and Yandex — the four major search engines at the time. They created it to establish a common, standardized language for describing web content in a machine-readable format. Today, Schema.org defines 823 types and over 1,500 properties covering everything from organizations and products to events, recipes, medical conditions, and job postings. It's maintained as an open community project and is the de facto standard for structured data on the web.

Schema.org is the vocabulary — it defines the types (Organization, Product, Article, Event) and properties (name, price, author, datePublished) used to describe content. JSON-LD is the format — it's how you embed that vocabulary into your HTML. Think of Schema.org as the dictionary and JSON-LD as the language you write it in. Other formats exist (Microdata, RDFa), but Google, Microsoft, and Yandex all recommend JSON-LD because it separates structured data from visual markup — making it easier to add, update, and manage without touching your frontend code.

Structured data affects AI visibility at three levels. Discovery: AI crawlers find and index your content more efficiently when it's machine-readable. Understanding: schema markup removes ambiguity — AI systems know exactly what your page contains (a product, not a review; a recipe, not a blog post). Grounding: AI systems can confidently cite your content because structured data provides verifiable, extractable facts. Fabrice Canel, Principal Product Manager at Microsoft Bing, confirmed at SMX Munich (March 2025) that schema helps Bing's LLMs understand web content. Google's AI Overviews — now appearing in 13% of all search results — prioritize sources with clear, structured information.

The problem is organizational, not technical. Schema markup sits at the intersection of content (editorial team), SEO (marketing team), and code (development team) — and no single team owns it. Development handles implementation but doesn't understand content strategy. Editorial creates content but doesn't write JSON-LD. SEO defines requirements but lacks backend access. In practice, schema falls through the cracks. enhancely's crawl analysis of 1,000 Central European enterprise websites found that 45% have no Schema.org markup at all, and another 53% have markup so thin that machines can barely read it.

Invalid schema is worse than no schema. If a required property is missing, Google ignores the entire markup of that page — as if you never implemented it (Google Search Central, Structured Data General Guidelines). Inaccurate schema — wrong prices, fabricated reviews, outdated opening hours — violates Google's spam policy. Consequences are sitewide: Google removes Rich Results not just from the affected page but across your entire domain. Recovery requires a formal Reconsideration Request and can take months. Peer-reviewed research (Dang et al., Semantic Web Journal, IOS Press, 2025) shows 40–50% of LLM-generated markup without a curation pipeline is invalid, hallucinated, or non-compliant. This is why enhancely's 3-layer validation pipeline exists: every markup block passes validity, factuality, and compliance checks before it touches your site.

Your content is already perfect for humans.

Get started → What matters for AI optimization?