Methodology

How the platform evaluates accessibility

The honest answer to how this actually works — standards profiles, scan modes, confidence scoring, evidence, reproducibility, the trust boundary, and the claims we will not make.

Book a demo View the open evaluation

Verassa evidence protocol

Evidence
01
Screenshot, DOM, replay, and axe baseline captured before decisions.
Judgment
02
Reviewer route, rationale, and owner stay attached to lower-confidence work.
Verification
03
Re-scan records and disclaimers travel with reportable outputs.

Standards profiles

Four standards profiles

A profile sets which success criteria a scan evaluates against. You choose the bar; the platform does not pick it for you.

Legal Baseline
WCAG 2.2 Level A and AA — the benchmark used in litigation and procurement.
Core A/AA
The full A and AA success criteria; the default for most audits.
Accessibility Excellence
Applicable AAA criteria plus established best practice, for teams going past the baseline.
Customer-Selected
A profile you define, for a specific policy or contractual requirement.

Scan modes

Four scan modes

Scan mode sets depth and scope. Deeper modes evaluate more pages and flows and take longer.

Quick
A fast pass over a small page set, for a first read on posture.
Standard
A representative sample of pages and flows; the typical audit depth.
Deep
A wide crawl including authenticated flows, for a thorough audit.
Continuous
Deploy-time scans that re-check critical flows as the site changes.

Confidence scoring

Every finding carries a confidence score

Each candidate finding is scored from 0.00 to 1.00 for how strongly the evidence supports it. Confidence is not a guess about severity — it measures how certain the evaluation is that the finding is real.

Thresholds gate what happens next. A high-confidence finding can be auto-confirmed; a lower-confidence finding routes to human review. Customers can move those thresholds to match their own tolerance for false positives versus missed findings.

Evidence pipeline

Every finding is captured with its evidence

A finding without evidence is an assertion. The evidence pipeline captures, for each finding, the DOM, screenshots, interaction recordings, the accessibility tree, a transcript of the agent's reasoning, and the automated baseline result.

Evidence is redacted for personal information before it is displayed or exported. The redaction step is not optional and not skippable.

Reproducibility

Reproducible by design

Every scan pins its configuration: the agent versions, the model versions, the prompts, the axe-core version, the standards-profile version, and the sub-agent code commit hashes. Every finding records a reproducibility hash over that configuration.

A customer can request reproduction. The platform re-runs the locked configuration and verifies the findings match. An audit you cannot reproduce is one you have to take on faith; this one you do not.

Trust boundary

Sensitive evidence stays inside the trust boundary

Customer source code, proprietary DOM snapshots, authenticated-flow evidence, credentials, and regulated content do not silently leave the trust boundary. The routing decision is enforced at the model-routing layer, not left to convention.

Every scan report includes a provenance section stating what data went where. Enterprise customers can route sensitive data classes in-tenant or to a self-hosted model.

Human-review gates

Where human review is required

The platform is explicit about which outputs a model may produce alone and which require a qualified reviewer. Internal working documents can be generated from agent output. Any output a third party will rely on — an external report, a VPAT or ACR draft — requires a named reviewer.

A report does not silently cross from draft to external-reliance status. The gate is a step, with a person in it.

Read the human review policy

Open evaluation

We publish how we are evaluated

The platform is measured against a gold-standard test set. We publish that test set, the scoring methodology, and the results — including the agent, model, and ruleset versions behind each run. Anyone can score the product themselves.

This is the only credible answer to how the product differs from an overlay vendor. We will not market the product as comparable to a human auditor unless and until the public evaluation supports that claim.

View the open evaluation methodology

Claim boundaries

What we do not claim

We do not call a site compliant, accessible, or conformant. A passing scan means the evaluation finished with no findings in that scope — not that a site meets a legal standard.

We do not describe a finding as resolved until verification confirms it. We do not promise protection from litigation. We do not present the diagnostic engine as validated before it has cleared its evaluation gate. The boundary on what this product claims is part of the product.

Read the claim boundaries

Evaluate the methodology yourself

Download the full methodology, read the open evaluation, or apply for design-partner access.

Download the methodology Request design-partner access

How the platform evaluates accessibility

Four standards profiles

Legal Baseline

Core A/AA

Accessibility Excellence

Customer-Selected

Four scan modes

Quick

Standard

Deep

Continuous

Every finding carries a confidence score

Every finding is captured with its evidence

Reproducible by design

Sensitive evidence stays inside the trust boundary

Where human review is required

We publish how we are evaluated

What we do not claim

Evaluate the methodology yourself

How the platform evaluates accessibility

Four standards profiles

Legal Baseline

Core A/AA

Accessibility Excellence

Customer-Selected

Four scan modes

Quick

Standard

Deep

Continuous

Every finding carries a confidence score

Every finding is captured with its evidence

Reproducible by design

Sensitive evidence stays inside the trust boundary

Where human review is required

We publish how we are evaluated

What we do not claim

Evaluate the methodology yourself