Published methodology. Auditable math. No black boxes. Here's exactly how we measure AI visibility, what the numbers mean, and where the limitations are.
Every tracked prompt runs against every active platform, every day. The sweep is the atomic unit of measurement. Nothing is inferred — it's all live queries, live responses, live capture.
Prompts aren't static. Every tracked prompt belongs to one of three groups, each with a different lifecycle:
Your crown queries. Tracked permanently, defended always. When a competitor threatens a Core position, the engine flags it immediately. These never rotate out.
Offensive targets. Queries you don't win yet but want to. Track them, get ranked actions to win them, execute. Once held at top 3 for 4+ weeks, the engine recommends promoting to Core and loading a new Conquest target.
Engine-surfaced opportunities. Queries you didn't know to track. Found by analyzing competitor citations, trending AI response patterns, and adjacent topics. The engine recommends promoting the best ones to Conquest.
This rotation cycle — discover, target, win, defend, rotate — keeps the platform fresh and prevents the "I've won everything, now what?" plateau that kills SaaS retention. Prompt capacity varies by tier: Lite up to 100, Pro up to 500, Hero 1,000+.
Within each group, prompts span four intent types to ensure coverage across the full buyer journey:
The Citelligence Index is a single number that measures overall AI visibility strength. It's not a ranking — it's a composite across six distinct signals, each weighted by how much it actually affects whether AI platforms recommend your brand.
How many of your tracked prompts you win — defined as top 3 position on at least 2 platforms. This is the highest-weight component because content coverage is the strongest observed predictor of AI visibility. A brand that wins 60% of prompts across 6 platforms is in a structurally different position than one that dominates one query type and disappears on others. This measures breadth of territory, not depth on a single query.
How well AI platforms recognize your brand as a distinct, authoritative entity. Signals: Schema.org Organization markup with sameAs links, knowledge panel presence, consistent naming across platforms, and domain authority of pages that get cited. Brands with strong entity signals get cited more reliably — even when the prompt doesn't mention them by name. Entity recognition compounds over time as signals accumulate across the web.
What fraction of AI responses mention your brand, weighted by position. Being mentioned third in 80% of responses is very different from being mentioned first in 30%. Weighted at 20% because this component is directly measurable — every sweep produces a precise number — but its causal weight is moderate rather than dominant. High Citation Density without Topical Authority is a fragile position.
A scored checklist of machine-readable signals deployed on your site: Organization schema, sameAs links (LinkedIn, Wikipedia, Crunchbase), Product schema, FAQ schema, Article author schema, BreadcrumbList, AggregateRating. Each signal is weighted by its mechanistic importance for AI citation. Score = deployed points / total possible points × 100. Weighted at 10% because the mechanistic logic is strong — AI models are trained to parse structured data — but the empirical correlation is newer and less established than the top three components.
What fraction of tracked AI platforms actively cite your brand. Cited on 4 of 6 platforms = 67% surface coverage. Cited on only ChatGPT = 17%. Weighted lower because platform reach is partially outside your control — some categories just don't get cited on certain platforms. But it matters as a diagnostic: if you're visible on ChatGPT but invisible on Perplexity, that's actionable.
The average sentiment of AI responses where your brand is mentioned. Classified by keyword heuristics: strong positive ("best", "top choice", "recommend"), moderate positive ("quality", "reliable"), neutral, moderate negative ("issues", "complaints"), strong negative ("avoid", "problems"). Score normalized to 0–100. Weighted lowest because sentiment is an outcome, not an input — improving it requires fixing the underlying content or entity signals, not sentiment directly. It's a signal, not a lever.
These weights represent our best judgment based on observed correlations in daily sweep data and mechanistic reasoning about how AI models process information. No peer-reviewed research exists on optimal AI visibility component weighting — the entire field is pre-empirical. We publish our reasoning so you can evaluate it yourself. Weights are reviewed quarterly as data accumulates.
| Score Range | Band Label | What it means |
|---|---|---|
| 0 – 20 | INVISIBLE | AI platforms rarely or never mention this brand. Structural visibility work is needed before optimization is meaningful. |
| 20 – 40 | EMERGING | Sporadic mentions across a narrow set of prompts or platforms. Entity and content gaps are the primary levers. |
| 40 – 60 | COMPETITIVE | Consistent mentions in core prompts. Competing with established brands but not yet capturing disproportionate share. |
| 60 – 80 | DOMINANT | Strong citation density, multi-platform presence, and positive sentiment. AI regularly recommends this brand by name. |
| 80 – 100 | CATEGORY LEADER | AI treats this brand as the default recommendation in its category. Defending this position is the priority. |
AI responses are narrative, not lists. The brand mentioned first gets the lead paragraph — framed as the recommendation. The brand mentioned third gets "other options include..." That's a structurally different outcome, not a marginal one.
| Position | Weight | Typical framing in AI responses |
|---|---|---|
| #1 named first |
1.0
|
Gets the lead paragraph. Framed as the recommendation. "The best option is X, which..." — owns the answer. |
| #2 |
0.85
|
Named second, often with "also consider" or "another strong option is" framing. Visible, not dominant. |
| #3 |
0.70
|
Mentioned, but with diminishing editorial weight. Usually the last brand given any substantive description. |
| #4–5 |
0.50
|
Listed in a "you might also look at" section. Rarely described. Often just a name with no supporting detail. |
| #6–10 |
0.30
|
Buried in the response. Minimal impact. Users who read this far are already in comparison mode, not recommendation mode. |
|
Mentioned, no clear rank |
0.20
|
Referenced in passing or only in citations. Brand name appears but isn't positioned as a recommendation. |
This weighting was calibrated against observed patterns in how users engage with AI responses — specifically, how quickly attention drops after the first recommendation. The weights reflect that the first brand named captures disproportionate editorial framing. These weights can be adjusted per account if a category has unusual response patterns that don't match this model.
The same prompt can return different results across sweeps. Platforms rate-limit. Models update. These aren't problems we hide — they're things we surface explicitly so the data stays trustworthy.
Gemini's free tier rate-limits aggressively during peak hours. When prompts are skipped because of quota exhaustion, the dashboard surfaces this honestly: "43 prompts skipped — Gemini quota exhausted." Scores are computed only from completed responses, not extrapolated from partial data.
We don't fill in missing data with estimates. A skipped prompt is a gap, logged as a gap.
AI responses change run-to-run. The same prompt can return different brand rankings on different days. We mitigate this three ways:
AI platforms personalize responses based on user context, history, and location. Our sweeps use neutral, non-logged-in API endpoints to minimize personalization effects. This means our measurements reflect the "default" response — what a new, anonymous user would see — not what a specific user would get based on their history. It's a baseline, not a universal truth.
AI platforms update their models regularly. When a platform updates — say, ChatGPT ships a new model version — the sweep automatically captures the new behavior. The insight engine flags model-change-driven movements explicitly: "ChatGPT updated its model this week — movements may reflect model behavior changes, not content changes on your end." We don't want you optimizing against a ghost.
Any measurement system has edges. These are ours. We'd rather tell you upfront than have you discover them mid-campaign. Knowing where the model breaks is part of using it correctly.
No AI platform exposes CTR or downstream conversion data through their APIs. We measure position and citation — not whether a user clicked a link in the AI response, visited your site, or converted. What you can reasonably infer: higher position = higher intent capture. What you can't infer: exact revenue impact from a position change.
We know the AI recommended you. We can't tell you whether the user acted on it. Intent fulfillment would require tracking the full user journey from AI response to site behavior — data that lives in your analytics stack, not ours. Connecting the two is possible with GA4/Plausible; we can guide you on setting up AI referral tracking, but we don't capture it directly.
Sweeps run at 7am CT. If a competitor publishes content at 3pm Monday, we catch it in Tuesday morning's sweep — not instantly. Real-time monitoring would require continuous API polling, which is both expensive and would exhaust platform rate limits in hours. Daily cadence is the right tradeoff for the data quality it produces.
We track 100–1,000+ prompts per brand depending on tier (Lite: up to 100, Pro: up to 500, Hero: 1,000+). Real users ask thousands of variations. Our prompt library is constructed to cover the highest-value intent clusters in your category — commercial, comparison, informational, and brand-specific — but there are always long-tail queries outside our sample. Coverage expands as you add custom prompts and as the engine discovers new query patterns from competitor citations.
Claude and DeepSeek have less stable grounding and web-search behaviors compared to ChatGPT or Perplexity. Their responses can be more variable run-to-run, and their platform architectures change faster. We weight them equally in the Surface Coverage component, but the dashboard flags instability when it's detected. Treat their data as directional until the platforms mature.
Raw sweep data doesn't tell you what to do. The insight engine reads the data and generates plain-English narratives and ranked actions. Here's exactly how it works and what guardrails we run on it.
Each action is drawn from a library of 15+ canonical move templates. Current examples: "publish a comparison page for [query X]," "respond on cited Reddit thread," "add sameAs schema links to homepage," "update FAQ schema on [page]," "publish vendor-perspective article on [topic]."
The engine scores each template against the current sweep data — which queries are losing, which components are weakest, which competitors are gaining ground — and ranks by expected Index lift. The highest-leverage moves surface first. Moves that wouldn't affect the current weakest signal are deprioritized automatically.
Reads data → selects relevant templates from library → scores against current state → ranks by impact → generates narrative. Every insight references actual sweep numbers. The voice guide (internal document) requires all narratives to cite specific metrics.
Fabricate causes. If a metric moved and the engine can't identify why, it says "We saw a shift in Citation Density this week — no clear causal pattern identified yet. Watching next sweep." We don't invent explanations to fill the gap.
Every generated insight passes three checks before reaching the dashboard: (1) a schema check confirming required fields are populated, (2) an action-library reference check confirming the recommended action maps to a valid template, and (3) a numeric-consistency heuristic confirming cited numbers match the sweep data it claims to reference.
AI visibility measurement is a new field. We don't have every answer. These are the measurement problems we're actively working on. When the methodology evolves, we'll publish the changes here.
ChatGPT has significantly more users than Perplexity. A citation on ChatGPT may have 10x the real-world impact of one on Perplexity. Right now we weight all platforms equally in the Surface Coverage component. We're exploring whether usage-weighted platform scoring would produce more accurate impact estimates — but user counts are hard to verify, change frequently, and vary by category and geography.
Sometimes AI doesn't cite your domain directly — it cites a review article, a Reddit thread, or an industry directory that mentions you. We currently count this as a "mentioned, no clear rank" (0.20 weight), but that may undervalue the indirect presence. We're working on tracking second-order citations more precisely: you appear in a cited source, therefore you have partial citation credit.
Ahrefs tracks cumulative lifetime backlink counts because links accrete over time. Should AI citations work the same way — do yesterday's citations still matter today? Or is AI visibility purely about current-state responses? Our current model is per-sweep snapshots with rolling averages. We haven't settled whether cumulative citation history should factor into the Index.
If you publish a comparison page and your Citation Density improves three weeks later, did the page cause the improvement? Probably. But AI platforms don't expose ranking signals the way Google does, and the lag between content publication and AI model behavior changes is irregular. We flag correlations, but we don't claim to have solved causal attribution. This is a hard problem in AI visibility specifically because models re-train or update on opaque schedules.
When the methodology changes — new components, revised weights, new platforms — we update this page and log the change. If you're making decisions based on this data, you should know what changed and when.
Questions about methodologyEvery daily sweep captures hundreds of raw AI responses across 6 platforms. Over weeks and months, that becomes a longitudinal dataset of AI recommendation behavior that doesn't exist anywhere else. We use it to test our own assumptions — and to push the frontier of what's actually understood about AI visibility.
Every sweep captures the full response from each AI platform for each tracked prompt. Not just "mentioned: yes/no" — the actual words, the citation sources, the competitor positions, the sentiment. Over time, this lets us ask questions nobody has been able to answer before:
Most AI visibility work focuses on the citation layer — getting AI to cite your page when it searches the web. But there's a deeper layer: the training data layer. When AI "knows" your brand from its training data, it recommends you without needing to search at all. We see this in our data — 27 responses where AI mentioned our test brand by name without citing the website. That's entity recognition from training, not from a search result.
The question we're actively researching: can you systematically influence training-data-level recognition? If your brand appears consistently across authoritative sources (Wikipedia, industry publications, Reddit, LinkedIn thought leadership), does that compound into stronger AI entity recognition over time? Our longitudinal data is beginning to show patterns, but it's too early to publish conclusions.
Schema.org markup is machine-readable by design. AI models that crawl the web can parse it. But does deploying Organization schema with sameAs links actually increase your citation rate? We weight Structured Data at 10% in the Index — based on mechanistic reasoning, not empirical proof. As we accumulate before/after deployment data across multiple brands, we'll publish the actual correlation (or lack of one).
When ChatGPT starts citing your page, how long does that citation persist? Does it decay like a backlink, or does it persist indefinitely until something better displaces it? Our daily sweeps are building the first longitudinal dataset of citation persistence. Early observations suggest citations are less stable than backlinks — model updates can shuffle positions overnight — but we need more data before publishing decay curves.
Our sweep data shows platforms behave very differently. Gemini gives explicit ranked positions; Perplexity almost never does. ChatGPT mentions brands by name; Google AIO tends to cite domains without name-dropping. We're building per-platform signal models to understand whether optimizing for ChatGPT is different from optimizing for Perplexity — and whether brands should allocate effort differently by platform.
We will not claim to have solved problems we haven't solved. When we publish findings, we'll show the data, the methodology, the sample size, and the confidence level. If something is a hypothesis, we'll say so. If something is proven, we'll show the proof.
The AI visibility field needs less marketing and more measurement. We intend to be on the measurement side.