Entity Map for AI Search: What It Is and How to Build One

Last updated
2nd JUNE 2026
STrategy
8 Minute ReAD

An entity map for AI search is a structured file that tells AI systems what a website knows. It declares the entities a site covers, how those entities relate, and where the evidence lives.

The machine-readable version sits at yourdomain.com/entitymap.json. A human-readable companion sits at yourdomain.com/entitymap.html. It exists for one job: to make a brand citable across ChatGPT, Perplexity, Google AI Mode, and Gemini. This is a generative engine optimization (GEO) artifact. GEO is the practice of earning brand visibility inside AI answers.

The term carries two meanings, and conflating them is the first mistake. One meaning is a visualization: a diagram of concepts and connections used as a thinking tool. The other is the EntityMap standard: a published file built for AI retrieval. This article covers the second. The standard is the version that affects how ChatGPT, Perplexity, Google AI Mode, and Gemini describe a brand.

The standard is authored by Fred Laurent and Dixon Jones. Fred Laurent initiated it with support from Dixon Jones. Version 1.0 reached stable status on 7 April 2026.

What is an Entity Map?

An entity map (EntityMap standard) is a root-level JSON file that declares a publisher's entities, typed relationships, and source-attributed evidence for consumption by AI agents, large language models, and retrieval pipelines.

The distinction matters because the term itself demonstrates the problem the standard solves. Search discourse blends the visualization meaning and the standard meaning into one fuzzy signal. That is a disambiguation failure. Entity maps exist to stop exactly that kind of conflation for the brands that publish them.

entity map creator claude skill cta graphic

Where the Entity Map Sits in the Standards Stack

An entity map fills a gap that two older standards never addressed. Sitemap.xml tells crawlers which pages exist. Schema.org annotates what appears on a single page. Neither declares what an organization knows across its whole site.

StandardUnitQuestion it answers
sitemap.xmlPageWhich URLs exist on this site?
Schema.orgPage elementWhat does this page contain?
EntityMapSite-wide entity graphWhat does this organization know, and how does it connect?

The three are complementary, not competing. A site can run all three at once. EntityMap operates one level up from page annotation. It declares institutional knowledge as a graph.

Schema markup and the entity map reinforce each other directly. A page marked up with schema.org confirms an entity at the page level. The entity map declares that same entity at the site level. Shared identifiers connect the two layers. Reuse the schema @id and sameAs values as the entity map's stable IDs and sameAs links. Page-level markup then becomes corroborating evidence for a site-wide claim. Deploy schema first, then let the entity map aggregate those entities into one graph.

Why Entity Maps Matter for AI Search

AI retrieval today works at the page level. A pipeline fetches HTML, strips formatting, and chunks the result into passages. It does this with no structured awareness of which entity a passage describes or who published it. That mechanism produces three failures no amount of good writing fixes alone.

FailureWhat breaksConsequence for the brand
DisambiguationSurface-form variants read as separate signalsAuthority dilutes across fragments
AttributionPublisher identity does not survive aggregationContent earns the answer, the brand goes unnamed
ReasoningConcept relationships stay implicit in proseThe model infers logic, often wrong, always hedged

The attribution failure is the ghost citation problem. An AI answer uses a brand's content. The URL may appear as a footnote. The brand name never enters the answer text. The brand absorbs the traffic risk without the recognition benefit.

Ghost citations have two halves. The retrieval half happens when live content is fetched and the publisher field is lost. The training half happens when associations are baked into model weights. An entity map addresses the retrieval half. It cannot retrain a model. This is the honest boundary of the standard, and it is the half a publisher can actually control.

These three failures map directly to the Entity-First Framework. Disambiguation is a Foundation problem. Reasoning is a Structure problem. Attribution is a Signals problem. An entity map is a single artifact that addresses all three at once. It is also a direct lever on two Proof of Importance signals: Entity Relationships and Corroboration.

Entity Maps in Relation to the Wider GEO Practice

An entity map is one artifact inside a broader discipline. Entity mapping as a practice means researching the entities a brand owns. It then structures a site around them. The EntityMap file is the machine-readable endpoint of that work.

Two practices feed the file. The first is entity SEO architecture, which organizes a site so entities and their relationships are explicit. The second is personal entity SEO, which establishes the founder as a recognized Person entity. The entity map declares the output of both as a single graph.

The practice builds the entities. The standard publishes them for AI consumers. A brand already doing this work has the raw material for the file.

How an Entity Map Works

An entity map is built from three nested objects. The conformance floor is small. A valid file needs roughly 12 fields across these three objects.

Entities are the named things a site covers. Products, people, concepts, methodologies, locations, and metrics all qualify. Each entity carries a stable ID, a type, a name, a definition, and at least one evidence chunk.

Relations declare how entities connect. Each relation is a typed predicate pointing from one entity to another. Examples include IMPROVES, DEPENDS_ON, and PRODUCED_BY. Every relation carries a targetName, and targetName is required. It survives aggregation when an ID does not. Use targetId for an internal entity. Use targetUri for an external concept on Wikidata or schema.org.

Chunks are the evidence. Each chunk is a short extractive passage from a source page. It carries the source URL, the page title, and the publisher name.

The relation layer is an RDF triple

The relation object is not new. It is a typed triple in the subject-predicate-object form that has defined the semantic web for two decades. The entity is the subject. The predicate is the relationship. The target entity is the object.

This lineage matters for two reasons. It connects EntityMap to established knowledge-graph practice rather than inventing a parallel system. It also explains the mechanism behind the reasoning advantage. A consumer reads a declared chain directly instead of inferring it from scattered prose.

Consider a two-hop chain. Fragmented Data CONFLICTS_WITH Data Management, which IMPROVES Regulatory Compliance. Standard retrieval must find a sentence linking all three. The entity map states the path explicitly. The model reads the publisher's logic rather than reconstructing it. Hedging language drops out of the answer.

Lineage: knowledge graph databases and corroboration

An entity map ports the model used by knowledge graph databases into a public file. A property graph in Neo4j stores nodes, edges, and properties. An RDF triple store holds subject-predicate-object statements queried with SPARQL. The entity map exposes that same structure where any AI consumer can read it.

Corroboration runs through the sameAs field. Each entity and the publisher can link to a Wikidata or Wikipedia URI. That anchors the brand to the open knowledge graph AI systems already trust. The sameAs link is how a private declaration connects to public third-party corroboration.

Entity types

The standard defines 15 core types across three tiers. The tier reflects the role the entity plays.

TierPurposeTypes
KnowledgeConcepts and frameworksConcept, ProprietaryTerm, Methodology, Metric, Taxonomy
ActorThings that act or are offeredPerson, Organization, SoftwareProduct, PhysicalProduct, Service, Platform, Place
TemporalTime-bound instrumentsEvent, Standard, Regulation

The most strategic type for a SaaS company is ProprietaryTerm. It declares a publisher-coined concept whose definition is authoritative. A conforming consumer treats that definition as canonical and does not blend it with general priors. That is the mechanism for protecting a proprietary framework from being merged into a generic equivalent.

Predicates

The standard ships 24 predicates across three tiers by semantic hardness. The tier sets the trust behavior.

TierCountConfidence fieldSample predicates
Hard11Not requiredPART_OF, DEPENDS_ON, MEASURES, PRODUCED_BY, AUTHORED_BY
Structural7OptionalENABLES, PREVENTS, OFFERS, PRECEDES
Interpretive6RequiredIMPROVES, DEGRADES, LEADS_TO, SUITED_FOR, TARGETS, ACHIEVES

Interpretive predicates carry editorial judgment. They force a confidence value of declared or inferred. Consumers down-weight inferred relations. The system encodes epistemic honesty into the graph.

How to choose types and predicates

The hardest part of authoring is picking the right type and predicate. The standard ships decision rules for both. These are the boundary cases that trip up most builders.

Type decisions:

Type decisionRule
Concept vs ProprietaryTermExists independently of you? Concept, add sameAs. You coined or defined it? ProprietaryTerm.
SoftwareProduct vs Platform vs ServicePrimarily software? SoftwareProduct. Ecosystem or developer layer is central? Platform. Primarily human-delivered? Service.
Standard vs RegulationEnacted into law? Regulation. Voluntary spec with a governance body? Standard.

Predicate decisions:

Predicate decisionRule
PART_OF vs DEPENDS_ONDefinitional constituent? PART_OF. Separate concept that needs the other to work? DEPENDS_ON.
INCLUDES vs COVERSObject is a component of the subject? INCLUDES. Subject is a hub, object is a sub-topic? COVERS.
ENABLES vs IMPROVESStructural, unambiguous enablement? ENABLES (Tier 2). Causal effect needing judgment? IMPROVES (Tier 3, confidence required).
TARGETS vs SUITED_FORDesigned for the object? TARGETS. Fits well but not designed for it? SUITED_FOR.

Two structural rules apply throughout. Inverses are implicit, so never declare both directions of PART_OF and INCLUDES. RELATES_TO is the predicate of last resort. A validator warns when it exceeds 20 percent of all relations.

The attribution rule

One rule carries the whole attribution thesis. The publisher field on every chunk must match the root publisher name exactly. Case and spacing count. This string is the mechanism that survives extraction into a vector database. It is also the most common validation failure. Copy the value, never retype it.

The trust layer

The standard provides two trust signals. The verificationStatus field is publisher-declared. The certification field is issued by an external registry.

verificationStatus takes one of three values. Use self-declared for a hand-written or manually reviewed file. Use generator-draft for automated output not yet reviewed by a human. Use third-party-verified only when a valid certification field backs it.

Unreviewed generator output must publish as generator-draft. The ProprietaryTerm type and any declared-confidence relation require human review first. The certification registry launches in Q3 2026 and is not live yet. Treat certification as a future signal, not a current requirement.

How Entities and Chunks are Derived from Content

Entities and chunks come from a publisher's existing pages. The extraction pipeline mirrors how AI systems read text, then improves on it.

Named entity recognition scans the content and flags candidate entities. Named entity recognition (NER) is a natural language processing (NLP) technique. It detects people, organizations, products, and concepts inside unstructured text. A generator runs NER across the site to surface the raw entity set.

Entity resolution then collapses surface-form variants into one canonical entity. "AI SOV", "AI Share of Voice", and "artificial intelligence share of voice" resolve to a single node. This step fixes disambiguation at the source rather than in the answer.

Chunk selection extracts the strongest evidence passage for each entity. A strong chunk is specific, self-contained, and under 600 characters. The pipeline pulls candidate passages, then a human prunes to the one to five that carry real proof.

Two rules govern quality. Definitions should be written by the publisher, not lifted from a generic source. Automated output must be reviewed before the ProprietaryTerm type or a declared-confidence relation is trusted.

entity map example graphic

How to Build an Entity Map

A focused site with 10 to 50 entities can be drafted in 20 minutes using Exalt Growth's EntityMap Claude Skill.


The build follows a defined sequence:

  1. List the entities the site most authoritatively covers. Prioritize depth over breadth.
  2. Write the root object with the canonical publisher name, URL, and a Wikidata sameAs where one exists.
  3. Add each entity with a stable ID, a type, a specific definition, and one to five chunks.
  4. Select evidence chunks. Use specific passages, not introductory sentences. Cap each at 600 characters.
  5. Add relations using standard predicates. Even a sparse relation graph improves traversal.
  6. Generate the HTML companion from the JSON. Never maintain the two files separately.

Deployment adds discovery signals. Serve both files at the domain root without authentication. Declare the file in robots.txt, in a head link tag, and in a sitewide footer link. List entitymap.html in sitemap.xml at priority 0.9 and changefreq weekly. That signals freshness and surfaces the file to systems that follow sitemaps. The footer link is the most reliable mechanism today, because every crawler that follows HTML links will find it.

<url>
  <loc>https://www.exaltgrowth.com/entitymap.html</loc>
  <priority>0.9</priority>
  <changefreq>weekly</changefreq>
</url>

What the HTML companion must include

The entitymap.html file is not a freeform page. The spec sets six requirements for a conforming companion.

  1. It references entitymap.json with a <link rel="alternate" type="application/json"> tag.
  2. It embeds per-entity JSON-LD in <script type="application/ld+json"> blocks.
  3. It renders relations as internal hyperlinks when the target sits in the same file.
  4. It adds a data-publisher attribute to every chunk blockquote.
  5. It prints the publisher name as visible text in every chunk's <cite>.
  6. It never carries a noindex directive.
  • The visible-text rule is the one most builders miss. Many LLM pipelines strip HTML tags before ingestion. Attribution stored only in attributes vanishes at that step. The cite line is the fallback that survives plain-text extraction.

Common mistakeFix
Publisher name mismatch on chunksCopy the exact root value to every chunk
Stale generated timestampUpdate the timestamp on every rebuild
Many entities, thin evidenceFewer entities, stronger chunks
Generic Wikipedia-style definitionsDefine each concept as the site uses it
Editing JSON and HTML separatelyGenerate HTML from JSON as the source of truth

What SaaS Companies Should Include

A SaaS entity map should declare the things AI systems most often get wrong about a software brand. Complex B2B software is a good fit, because proprietary terminology and category ambiguity are where retrieval fails

AssetEntity typeWhy it belongs
Proprietary frameworks and methodologiesProprietaryTerm, MethodologyDeclares the authoritative definition, blocks generic conflation
The product itselfSoftwareProductAnchors features and capabilities to one named entity
Founders and named expertsPersonCarries author authority and AFFILIATED_WITH relations
Core product metricsMetricSource of MEASURES relations for capability claims
Category and sub-categoriesTaxonomy, ConceptEstablishes the conceptual territory the brand owns

Four priorities should shape the SaaS build. Declare proprietary terms as ProprietaryTerm so the definition is canonical, not inferred. Use typed relations to assert differentiation, since competitor conflation is the dominant SaaS retrieval error. Attach every chunk to the canonical brand name to fight ghost citations. Anchor the publisher and key people to Wikidata via sameAs to reinforce corroboration.

The differentiation point is the highest leverage. Without a structured declaration, an AI system blends a product with its nearest generic equivalent. That erases the positioning a product team built. OFFERS declares what a product provides. It specifies the exact feature set, and that precision is the differentiator.

The standard has no "differs from" predicate. Differentiation is mostly an entity-level move. Use ProprietaryTerm for coined features. Use canonicalLabel to anchor the general term without losing yours. Use precise OFFERS and INCLUDES relations to fix the exact capability set. A consumer then reads a specific product, not a generic category.

Definition: canonicalLabel. The field that prevents generic conflation. The name field carries your proprietary term. The canonicalLabel field carries the widely known general term. A conforming consumer reads both without collapsing one into the other.

Worked example: name is "Signal Density Scoring", canonicalLabel is "feature weighting". The proprietary term stays intact in attribution. The general anchor still aids cross-publisher disambiguation. You keep your language without becoming invisible to a generic query.

EntityMap Example

The entry below shows a single SaaS entity. It declares a proprietary framework as a ProprietaryTerm and asserts a typed relation to the metric it improves.

{
  "version": "1.0",
  "schema": "https://entitymap.org/spec/v1.0",
  "publisher": {
    "name": "Aperture Analytics",
    "url": "https://aperture.example.com",
    "sameAs": "https://www.wikidata.org/wiki/Q42"
  },
  "generated": "2026-06-02T00:00:00Z",
  "entities": [
    {
      "entityId": "e_001",
      "@type": "ProprietaryTerm",
      "name": "Signal Density Scoring",
      "alternateName": "SDS",
      "canonicalLabel": "feature weighting",
      "description": "Aperture's method for ranking forecast inputs by predictive weight before model training.",
      "relations": [
        {
          "predicate": "IMPROVES",
          "targetId": "e_002",
          "targetName": "Forecast Accuracy",
          "confidence": "declared"
        }
      ],
      "hasChunks": [
        {
          "chunkId": "c_001",
          "text": "Signal Density Scoring ranks each forecast input by predictive weight, so low-signal variables never dilute the model.",
          "sourceUrl": "https://aperture.example.com/product/signal-density-scoring",
          "pageTitle": "Signal Density Scoring",
          "publisher": "Aperture Analytics",
          "contentType": "definition"
        }
      ]
    },
    {
      "entityId": "e_002",
      "@type": "Metric",
      "name": "Forecast Accuracy",
      "description": "Aperture's measure of predicted values against observed outcomes, reported as a rolling percentage per forecast window.",
      "hasChunks": [
        {
          "chunkId": "c_002",
          "text": "Forecast accuracy measures Aperture's predicted values against observed outcomes. It is reported as a rolling percentage per forecast window.",
          "sourceUrl": "https://aperture.example.com/metrics/forecast-accuracy",
          "pageTitle": "Forecast Accuracy",
          "publisher": "Aperture Analytics",
          "contentType": "definition"
        }
      ]
    }
  ]
}

Read the relation as a triple. Aperture Analytics, via Signal Density Scoring, IMPROVES forecast accuracy. The confidence field marks it as a declared claim, not an inference. The ProprietaryTerm type tells a consumer to treat the definition as canonical. The chunk publisher matches the root exactly, so attribution survives extraction. The target now resolves to a defined Metric entity, so the file passes internal reference checks.

Platform-Specific Reality

AI platforms consume content through distinct retrieval layers. An entity map helps most where live retrieval happens at inference time.

PlatformHow an entity map helps
PerplexityReal-time retrieval and citation-forward UX surface attributed chunks
ChatGPT SearchLive fetch layer can read declared entities and relations
Google AI ModeCrawlable HTML companion exposes structured, attributed content
Model trainingIndirect only. Crawled content may enter a future training run

The strongest case is forward-looking. Agentic crawlers and RAG pipelines are the growing category. An entity map builds the right infrastructure for where AI consumption is heading.

The Honest Limits

Intellectual honesty separates strategy from hype. The standard has real constraints worth naming.

No major AI lab has committed to consuming entitymap.json. The consumer-side conformance guidance is non-normative. Every value claim depends on future adoption. The discovery conventions in robots.txt are proposed, not yet supported by crawler teams. The certification registry is not live.

The precedent offers a measured case for optimism. GoodRelations published openly in 2008 with no mandate. Schema.org absorbed its core concepts by 2012. Open vocabularies can spread through demonstrated value. Whether EntityMap follows that path depends on adoption.

The defensible position is to implement early, treat it as research, and avoid committing to a maintenance burden that depends on consumers who do not yet exist. The marginal cost is low for a brand that already maintains structured data and entity infrastructure.

Entity maps extend the same entity-first logic behind durable AI search visibility. To build the entity and schema infrastructure that makes a SaaS brand the default answer across AI platforms, book a strategy call.

Entity Map FAQs

What is an entity map in SEO?

An entity map is a structured file that declares a site's entities, their relationships, and source-attributed evidence for AI systems. It complements sitemap.xml and schema.org rather than replacing them.

How is an entity map different from schema.org?

Schema.org annotates individual pages. An entity map operates at the site level. It declares the full set of concepts a site covers and how they connect across the whole domain.

Does an entity map fix ghost citations?

It fixes the retrieval half. Chunk-level attribution carries the publisher name through extraction into vector databases. It cannot fix associations already baked into a model's training weights.

Do AI engines read entity maps today?

No major AI lab has formally committed to consuming entity maps. Live retrieval systems that fetch HTML will read the crawlable companion file through a footer link.

How many entities should a SaaS entity map include?

Begin with 10 to 50 entities. Prioritize depth over breadth. Fifteen well-evidenced entities outperform eighty entities with one weak chunk each.

What should a SaaS company declare first?

Declare proprietary frameworks as ProprietaryTerm and the product as SoftwareProduct. These are the entities AI systems most often blend with generic equivalents.