Trust Center

Open evaluation methodology

The product is measured against a public test set, with a public scoring methodology. The point is simple: you do not have to take our word for how good it is.

Verassa evidence protocol

Evidence
01
Screenshot, DOM, replay, and axe baseline captured before decisions.
Judgment
02
Reviewer route, rationale, and owner stay attached to lower-confidence work.
Verification
03
Re-scan records and disclaimers travel with reportable outputs.

The test set

A published gold-standard test set

Detection quality is measured against a gold-standard test set: real sites, evaluated by qualified human auditors, used as the ground truth a scan is scored against.

The test set and the methodology are published. An accessibility researcher, or a competitor, can run the product against the same set and check the numbers.

Scoring

How a result is scored

Each evaluation reports precision and recall against the ground truth: what the product found that was real, and what it missed. The scoring rubric defines what counts as a finding match and what counts as a miss, so the numbers mean the same thing every time.

Results are versioned. Each published result records the agent versions, the model versions, the axe-core version, and a reproducibility hash, so any number can be traced to the exact configuration that produced it.

The editorial standard

What we will not say until the evaluation supports it

The platform will not be marketed as comparable to a human auditor unless and until the public evaluation supports that claim. Until then, the product is described as what it is: an AI-assisted audit workflow with qualified human review.

This is the single most important commitment on this page. It is the difference between a product that earns trust and one that borrows it.

Back to the Trust Center

Read the full methodology

The methodology page covers scan modes, confidence scoring, evidence, and reproducibility in depth.

Read the methodology Book a demo