verification-before-completion

A Claude Code skill that blocks the agent from claiming work is done until it has run a verification command, read the output, and confirmed the result — replacing 'should work' and 'looks good' with evidence.

Stop claiming things work without running them first

Source Jesse Vincent

License MIT

First documented 2026-04-29

Updated 2026-05-05

Receipts firsthand ✓

Code Quality Code Review

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

is this done
did the tests pass
ready to commit

What it does

verification-before-completion enforces a single rule: never claim a task is complete without fresh evidence from running the verification. The skill’s protocol is four steps — identify what proves the claim, run it, read the full output, then confirm or fail. It explicitly rejects language like “should work”, “looks good”, “this ought to pass” as dishonest substitutes for actual verification.

The skill targets the most common silent failure in Claude Code sessions: agent reports “done”, user merges, CI breaks, or worse — production breaks. The fix isn’t trust; it’s making verification a hard gate before any completion claim.

When to use it

Before announcing any task is complete
Before committing code
Before opening a PR or marking a TODO as done
When pattern-matching makes a fix “look right” — verify it actually runs
After applying a fix the agent feels confident about (overconfidence is the failure mode)

When not to reach for it:

During exploratory phases where you’re investigating, not finalizing
For pure refactors where tests already passed and behavior didn’t change (still verify, but the gate is lighter)

Install

From the obra/superpowers repo at skills/verification-before-completion/. Pairs naturally with systematic-debugging (which catches symptom-patching) and the code-review skills.

What a session looks like

Identify the verification. What command, test, or output would prove the claim? “The function works” → npm test src/foo.test.ts. “The build is clean” → npm run build. “The bug is fixed” → reproduce the original failure and observe it doesn’t fail.
Run it completely. Not part of it. Not until the first failure. The full run.
Read the full output. Errors hide in the middle of long stack traces. Warnings matter. The skill resists the urge to grep for “PASS” and call it done.
Confirm or fail honestly. If it passed, say so with the evidence. If it didn’t, say so with the evidence. Never paper over a partial pass.

The discipline that makes it work: the agent says nothing about completion until step 4 produces verifiable evidence. “Should work” is treated as a failure of the protocol.

Receipts

2026-05-05 — Verifying a 160-skill receipts audit

Used to fact-check four numeric claims a receipts audit had just produced (TODO=18 / firsthand=5 / generic=137 / total=160). Verification commands ran:

ls src/content/skills/*.md | wc -l → 160 ✓
grep -l "TODO — to be filled in" src/content/skills/*.md → 18 files ✓ (matches the named import waves: 4 Osmani + 8 personas + 5 obra + 1 ppt-master)
grep -l "**Pattern that works:**" src/content/skills/*.md → 142 files, not 5

The grep delta surfaced the real epistemological asymmetry: the agent had classified 5 skills as “firsthand” because they used “Pattern that works:” phrasing — but that phrase appears in 142 of 160 files. So firsthand-by-phrasing isn’t a verifiable signal at all. The 5-firsthand classification was a content-context judgment that the skill demoted from “verified” to “agent’s read, accepted on trust.” Two of four claims survived as greppable; two demoted to subjective.

That’s the wedge: the skill catches its own meta-failure mode. On a routine claim (“18 files match X”), grep is sufficient. On a semantic claim (“these 5 read as firsthand”), evidence-before-assertion exposes that the assertion isn’t verifiable by the rubric the asserter implied.

Source and attribution

From Jesse Vincent’s superpowers repository.

License: MIT.

The skill’s framing is explicit: claiming completion without verification is dishonesty, especially before commits, PRs, or task closure. That framing is what gives the skill teeth — it’s not “be careful”, it’s “you are lying if you skip this”.