Trust Center
The product is measured against a public test set, with a public scoring methodology. The point is simple: you do not have to take our word for how good it is.
Verassa evidence protocol
Evidence
01Screenshot, DOM, replay, and axe baseline captured before decisions.
Judgment
02Reviewer route, rationale, and owner stay attached to lower-confidence work.
Verification
03Re-scan records and disclaimers travel with reportable outputs.
The test set
Detection quality is measured against a gold-standard test set: real sites, evaluated by qualified human auditors, used as the ground truth a scan is scored against.
The test set and the methodology are published. An accessibility researcher, or a competitor, can run the product against the same set and check the numbers.
Scoring
Each evaluation reports precision and recall against the ground truth: what the product found that was real, and what it missed. The scoring rubric defines what counts as a finding match and what counts as a miss, so the numbers mean the same thing every time.
Results are versioned. Each published result records the agent versions, the model versions, the axe-core version, and a reproducibility hash, so any number can be traced to the exact configuration that produced it.
The editorial standard
The platform will not be marketed as comparable to a human auditor unless and until the public evaluation supports that claim. Until then, the product is described as what it is: an AI-assisted audit workflow with qualified human review.
This is the single most important commitment on this page. It is the difference between a product that earns trust and one that borrows it.
The methodology page covers scan modes, confidence scoring, evidence, and reproducibility in depth.