Methodology

Prompt-and-panel graphic suggesting methodology for repeatable AI visibility measurement.

This site describes general methodology for AI visibility programs: how to specify prompts, run schedules, capture answers, version parsers, and report variance-aware metrics. It does not publish proprietary scoring formulas from any vendor. Instead, it states the mechanical properties any serious implementation should satisfy if stakeholders expect auditability.

What we document here

Reproducibility checklist

Before you trust a dashboard, confirm the program can answer: Which account or API key ran the job? Which locale? Which model label did the UI show? Which parser version produced the flags? Which prompts belong to which thematic basket? If any answer is “unknown,” your methodology is under-specified for enterprise use.

Metric definitions belong in writing

Teams should publish an internal metric dictionary that maps each KPI to SQL or pseudocode. For example, define “citation rate” as “share of runs where at least one canonical URL matches our domain set,” not as a vibe. When definitions change, bump a version and either reprocess history or segment series so old and new numbers never blend unintentionally.

Product methodology

Vendor-specific methodology belongs in your tooling provider documentation. Use their release notes when models or UI labels change, and demand export formats that preserve the mechanical fields above (timestamps, prompt ids, model strings, raw text, citations).

Last reviewed: 2026-04-12

Worked example: defining “cited”

Suppose an answer says, “Many teams use ExampleCo,” with a citation chip pointing to https://example.com/pricing?utm=foo. A minimal methodology records the URL, strips tracking query params to a canonical form, matches the registrable domain against your allowlist, and flags cited=true. If the sentence appears without a chip, your methodology must declare whether prose-only counts as a citation for your KPIs. That single decision can swing reported citation rates by double digits in verbose categories—document it once, centrally, and link analysts to the spec instead of re-debating in Slack weekly.

Worked example: comparative prompts

For the prompt “Compare Vendor A and Vendor B for SOC2 readiness,” store the ordered list of vendor names if the UI renders bullets; if not, store spans and let reviewers tag ordering intent. Aggregate with explicit sample sizes (“mentioned in 7/12 runs”) rather than a smoothed score whose denominator nobody remembers.

Ready to track in production?

Software helps you run prompts on schedules, store evidence, and compare engines without manual copy paste.

Start Tracking