Methodology
The honest answer to how this actually works — standards profiles, scan modes, confidence scoring, evidence, reproducibility, the trust boundary, and the claims we will not make.
Verassa evidence protocol
Evidence
01Screenshot, DOM, replay, and axe baseline captured before decisions.
Judgment
02Reviewer route, rationale, and owner stay attached to lower-confidence work.
Verification
03Re-scan records and disclaimers travel with reportable outputs.
Standards profiles
A profile sets which success criteria a scan evaluates against. You choose the bar; the platform does not pick it for you.
WCAG 2.2 Level A and AA — the benchmark used in litigation and procurement.
The full A and AA success criteria; the default for most audits.
Applicable AAA criteria plus established best practice, for teams going past the baseline.
A profile you define, for a specific policy or contractual requirement.
Scan modes
Scan mode sets depth and scope. Deeper modes evaluate more pages and flows and take longer.
A fast pass over a small page set, for a first read on posture.
A representative sample of pages and flows; the typical audit depth.
A wide crawl including authenticated flows, for a thorough audit.
Deploy-time scans that re-check critical flows as the site changes.
Confidence scoring
Each candidate finding is scored from 0.00 to 1.00 for how strongly the evidence supports it. Confidence is not a guess about severity — it measures how certain the evaluation is that the finding is real.
Thresholds gate what happens next. A high-confidence finding can be auto-confirmed; a lower-confidence finding routes to human review. Customers can move those thresholds to match their own tolerance for false positives versus missed findings.
Evidence pipeline
A finding without evidence is an assertion. The evidence pipeline captures, for each finding, the DOM, screenshots, interaction recordings, the accessibility tree, a transcript of the agent's reasoning, and the automated baseline result.
Evidence is redacted for personal information before it is displayed or exported. The redaction step is not optional and not skippable.
Reproducibility
Every scan pins its configuration: the agent versions, the model versions, the prompts, the axe-core version, the standards-profile version, and the sub-agent code commit hashes. Every finding records a reproducibility hash over that configuration.
A customer can request reproduction. The platform re-runs the locked configuration and verifies the findings match. An audit you cannot reproduce is one you have to take on faith; this one you do not.
Trust boundary
Customer source code, proprietary DOM snapshots, authenticated-flow evidence, credentials, and regulated content do not silently leave the trust boundary. The routing decision is enforced at the model-routing layer, not left to convention.
Every scan report includes a provenance section stating what data went where. Enterprise customers can route sensitive data classes in-tenant or to a self-hosted model.
Human-review gates
The platform is explicit about which outputs a model may produce alone and which require a qualified reviewer. Internal working documents can be generated from agent output. Any output a third party will rely on — an external report, a VPAT or ACR draft — requires a named reviewer.
A report does not silently cross from draft to external-reliance status. The gate is a step, with a person in it.
Open evaluation
The platform is measured against a gold-standard test set. We publish that test set, the scoring methodology, and the results — including the agent, model, and ruleset versions behind each run. Anyone can score the product themselves.
This is the only credible answer to how the product differs from an overlay vendor. We will not market the product as comparable to a human auditor unless and until the public evaluation supports that claim.
Claim boundaries
We do not call a site compliant, accessible, or conformant. A passing scan means the evaluation finished with no findings in that scope — not that a site meets a legal standard.
We do not describe a finding as resolved until verification confirms it. We do not promise protection from litigation. We do not present the diagnostic engine as validated before it has cleared its evaluation gate. The boundary on what this product claims is part of the product.
Download the full methodology, read the open evaluation, or apply for design-partner access.