# evidence-collector

> A Claude Code agent persona that runs UI/QA review against screenshot evidence — defaults to FAIL, expects 3-5 issues on every first pass, and rejects 'A+' scores or 'zero issues found' as fantasy reporting.

**Use case**: Catch broken UI claims before they ship — with screenshots, not vibes

**Canonical URL**: https://agentcookbooks.com/skills/evidence-collector/

**Topics**: claude-code, skills, code-quality, code-review

**Trigger phrases**: "QA this build", "evidence-based review", "did this actually ship"

**Source**: [Michael Sitarzewski](https://github.com/msitarzewski/agency-agents/blob/main/testing/testing-evidence-collector.md)

**License**: MIT

---

## What it does

`evidence-collector` is the QA persona in the agency-agents collection — alias `EvidenceQA`. It runs a UI review pass against actual screenshots (Playwright-captured by default), compares what's on screen to the original specification by quoting both verbatim, and writes a report where every claim is anchored to a screenshot filename. It treats "zero issues found" as a red flag, "A+" as fantasy, and "production ready" without comprehensive evidence as an automatic fail.

The persona has three operating beliefs: screenshots don't lie, default to finding 3-5 issues on first implementation, and prove everything. It refuses to accept "luxury" or "premium" claims that aren't visible in the screenshot, refuses to add requirements that weren't in the spec, and writes its own quality grades (Basic / Good / Excellent) instead of letting agents self-rate.

## When to use it

- Reviewing a UI build before merge or before a stakeholder demo
- Catching the gap between "agent says shipped" and "actual rendered page"
- Mobile/responsive QA where the failure mode is "looks fine on desktop, broken on mobile"
- Form, accordion, modal, and theme-toggle interaction testing where automation needs visual proof
- Pairing with `verification-before-completion` from obra/superpowers — that skill enforces "run the verification"; this persona enforces "the verification produced visual evidence"

When *not* to reach for it:

- Pure backend / non-UI work where there's no screenshot to anchor to
- Internal-only tools where polish doesn't matter and the brutal-honesty grading is overkill
- First-draft exploratory work before a spec exists — the persona compares output to spec, so without one it has nothing to grade against

## Install

From [msitarzewski/agency-agents](https://github.com/msitarzewski/agency-agents) at `testing/testing-evidence-collector.md`. Copy to `~/.claude/agents/` or use the repo's installer. The persona's process expects a Playwright capture script (the upstream skill assumes `qa-playwright-capture.sh`) and a directory of screenshots to review — without those, the "screenshots don't lie" rule has nothing to enforce.

## What a session looks like

1. **Reality-check commands first.** Run the screenshot capture, list what's actually built (`ls resources/views/` or equivalent), grep for any "luxury / premium / glass / morphism" claims that need visual backing.
2. **Visual evidence analysis.** Open each screenshot, describe what's *visible* (not what should be there), quote the spec verbatim. Any spec-vs-screenshot mismatch becomes a finding.
3. **Interactive testing.** Accordions: header click expands content? Forms: submit / validate / show errors? Navigation: smooth scroll lands at the right section? Mobile: hamburger menu opens? Theme toggle: light/dark/system actually switches?
4. **Per-issue report block.** Each finding gets: evidence (screenshot filename), description, priority (Critical / Medium / Low). The minimum is 3-5 issues — fewer than that flags as "look harder."
5. **Honest quality grade.** Basic / Good / Excellent for design level. FAILED / NEEDS WORK / READY for production readiness, defaulting to FAILED unless overwhelming evidence proves otherwise. No "A+" — the persona explicitly refuses perfect scores on first implementations.
6. **Re-test required.** The default close is FAILED with a list of fixes and a "re-test required: YES" line. The bar to clear isn't "looks fine" — it's "screenshot proves the spec is met."

The discipline that makes it work: rule 2 of the personality, "default to finding 3-5+ issues minimum." Without that floor, the persona collapses into rubber-stamping; with it, the QA pass actually catches what the implementing agent missed.

## Receipts

### 2026-05-05 — UI evidence pass on `/skills/seo/` and `/skills/topic/seo/`

Dispatched against the live SEO skill detail page and SEO topic index. Explicit scope downgrade up front: no screenshot tooling in this workspace (Playwright lives in a sister workspace), so HTML-only evidence via WebFetch.

Both pages returned 200 OK with full security-header set. Schema findings: `/skills/seo/` carries TechArticle + BreadcrumbList + FAQPage in one ld+json array; `/skills/topic/seo/` has CollectionPage + ItemList(`numberOfItems: 28`) but **no BreadcrumbList**. Schema asymmetry between skill detail and topic index is worth tracking. The 8 sub-groups (Audits & diagnostics, Technical & schema, AI search & GEO, Content & clustering, Programmatic & at-scale, Local & e-commerce, Competitive & backlinks, Tooling & strategy) are present and ordered as the source markdown describes; the hand-written lede is verbatim.

The receipt-worthy finding came from comparing schema to render: the canonical `seo` slug appears at **position 4** in the alphabetical ItemList (after `ai-seo`, `programmatic-seo`, `schema-markup`) and is not name-checked in the lede paragraph either. Among 25+ `seo-*` siblings, the canonical entry is buried — a discoverability gap an audit-from-source-code wouldn't catch because it's an ordering artifact, not a content one.

Honest scope: persona was emulated, not the actual installed file. The "screenshots don't lie" rule was downgraded to "HTML doesn't lie" — strong on schema and link presence, weak on visual hierarchy. The 3-5-issues-minimum rule held: 4 distinct findings produced, none rubber-stamped.

## Source and attribution

From [Michael Sitarzewski's agency-agents repository](https://github.com/msitarzewski/agency-agents/blob/main/testing/testing-evidence-collector.md), an MIT-licensed collection of 144+ AI agent personas across engineering, marketing, design, testing, and specialized roles.

License: MIT.

Quote from the persona body, verbatim: *"Screenshots don't lie. If you can't see it working in a screenshot, it doesn't work."* The 3-5-issues-minimum rule, the FAILED-by-default close, and the refusal to award A+ are the operational expression of that stance — receipts as the wedge against fantasy reporting.