What does the diagnose skill do?

Walks a six-phase diagnosis loop: build a feedback loop first (failing test, curl script, CLI diff, headless browser script, replayed trace, or throwaway harness), then reproduce, minimise the case, generate hypotheses, instrument the code path, fix, and lock in a regression test. Frames feedback-loop construction as the skill — the rest is mechanical once you have it.

When should I use the diagnose skill?

Use on hard bugs and performance regressions where line-by-line reading isn't producing a hypothesis — when the bug is intermittent, when reproduction is flaky, when staring at logs isn't narrowing the cause. Complements verification-before-completion downstream and systematic-debugging at the hypothesis-test layer.

diagnose

A Matt Pocock Claude Code skill that runs a disciplined diagnosis loop for hard bugs and performance regressions — reproduce → minimise → hypothesise → instrument → fix → regression-test — with explicit guidance that constructing a fast, deterministic pass/fail signal is the load-bearing phase and everything else is mechanical.

Diagnose hard bugs through a disciplined feedback-loop-first process

Source Matt Pocock

License MIT

First documented 2026-05-13

Receipts TODO

Debugging

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

diagnose this
debug this
this is broken
performance regression

What it does

diagnose is a process discipline for hard bugs. The six phases:

Build a feedback loop. This is the skill. A fast, deterministic, agent-runnable pass/fail signal for the bug. Everything else is mechanical once you have it.
Reproduce. Trigger the bug deterministically.
Minimise. Strip the case down to the smallest input/state that still reproduces.
Hypothesise. Generate candidate causes against the minimised case.
Instrument. Add logging/assertions/breakpoints at the candidate sites.
Fix → regression-test. Lock in the fix with a test that fails without it.

The skill is explicit that phase 1 deserves disproportionate effort. Six recommended ways to construct a feedback loop, in rough order:

Failing test at whatever seam reaches the bug (unit, integration, e2e).
Curl / HTTP script against a running dev server.
CLI invocation with a fixture input, diffing stdout against a known-good snapshot.
Headless browser script (Playwright / Puppeteer) driving the UI and asserting on DOM/console/network.
Replay a captured trace. Save a real network request, payload, or event log to disk; replay it through the code path in isolation.
Throwaway harness. Minimal subset of the system (one service, mocked deps) that exercises the bug code path with a single function call.

When exploring the codebase, the skill notes: use the project’s domain glossary to get a clear mental model of the relevant modules, and check ADRs in the area you’re touching.

When to use it

Reach for it when:

A bug is hard to reproduce or intermittent — the line-by-line read isn’t generating hypotheses
You’ve been staring at logs for 20 minutes without narrowing the cause
A performance regression has surfaced and you don’t have a reliable benchmark to bisect against
You’re handing off a bug investigation to another agent and want a structured starting point rather than “good luck”

When not to reach for it:

Trivial bugs with an obvious one-line fix — the discipline overhead exceeds the win
Bugs where you already have a fast pass/fail signal — you’re already at phase 2

Install

The skill is distributed via Pocock’s skills repo. Install via his recommended path (npx skills add or manual copy into .claude/skills/diagnose/) — see the repo README for canonical install instructions.

What a session looks like

A typical session starts with the agent asking the feedback-loop question first — can we construct a pass/fail signal for this bug? — before any hypothesis-generation happens. The first 60–80% of the session is usually spent in phase 1: trying a failing test, then a curl script, then a replayed trace, until something deterministic locks the bug down.

Once the loop exists, the remaining phases run fast. Hypothesise → instrument → fix → regression-test usually takes a fraction of the time that phase 1 did. That’s the skill’s design point.

Complementary to systematic-debugging (which is stronger at the hypothesis-test layer once you’re in phase 4) and verification-before-completion (which guards the regression-test at phase 6).

Receipts

TODO — to be filled in from a real session. When the skill is triggered in production use, capture: time spent in phase 1 vs phases 2–6 (the skill claims phase 1 should dominate), which feedback-loop construction technique worked, and whether the regression test added at phase 6 actually fails without the fix.

Source and attribution

Originally written by Matt Pocock. The canonical SKILL.md plus sub-docs live in the engineering/diagnose folder of his public skills repository.

License: MIT. You can install, adapt, and redistribute the skill, with attribution preserved.

This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.