Skip to main content

tdd

A Matt Pocock Claude Code skill that runs test-driven development as a red-green-refactor loop on vertical slices — explicit guidance to test behaviour through public interfaces (not implementation), refuse horizontal slicing (don't write all tests then all code), and avoid mock-coupled tests that break under refactor without behaviour change. Bundled tests.md and mocking.md sub-docs.

Build features or fix bugs through red-green-refactor on vertical slices

Source Matt Pocock
License MIT
First documented
Receipts TODO

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

  • use TDD
  • red green refactor
  • test-driven
  • write tests first

What it does

tdd runs a red-green-refactor loop with two non-negotiable rules carried in the SKILL.md and its sub-docs:

Rule 1 — test behaviour, not implementation. Tests should verify behaviour through public interfaces. Code can change entirely; tests shouldn’t. Good tests are integration-style and exercise real code paths through public APIs. They describe what the system does, not how it does it. A good test reads like a specification — “user can checkout with valid cart” tells you exactly what capability exists. These tests survive refactors because they don’t care about internal structure.

Bad tests are coupled to implementation: they mock internal collaborators, test private methods, or verify through external means (querying a database directly instead of using the interface). The warning sign — your test breaks when you refactor, but behaviour hasn’t changed — means the test was testing implementation, not behaviour.

Rule 2 — vertical slicing only, no horizontal slicing. The skill calls out an explicit anti-pattern: DO NOT write all tests first, then all implementation. Horizontal slicing treats RED as “write all tests” and GREEN as “write all code.” It produces crap tests: tests written in bulk test imagined behaviour, not actual behaviour. You end up testing the shape of things (data structures, function signatures) rather than user-facing capability.

The fix is vertical slicing: one slice at a time, one red test, one green implementation, one refactor pass.

Bundled sub-docs:

  • tests.md — examples of behaviour-testing vs implementation-testing
  • mocking.md — guidelines on what to mock (rarely) and what to leave real (usually)

When to use it

Reach for it when:

  • You’re building a new feature or fixing a bug where the behavioural contract is clearer than the implementation
  • The system has integration-style test coverage already and one more vertical slice rides the same harness
  • A previous attempt at the same feature broke under refactor because the tests were coupled to implementation
  • You’re handing the slice to an agent and want a deterministic gate at slice-close (red test → green test)

When not to reach for it:

  • Prototype code — reach for prototype instead, which is explicitly throwaway
  • Exploratory spikes where you’re figuring out the shape, not the behaviour
  • Bug fixes where the regression test is the only test you’ll ever write — that’s not TDD, that’s a regression-test discipline that lives at the close of diagnose phase 6
  • Codebases dominated by mock-heavy unit tests — the skill’s behaviour-testing rule will surface a deeper architecture problem first

Install

The skill is distributed via Pocock’s skills repo. Install via his recommended path (npx skills add or manual copy of the SKILL.md + the tests.md and mocking.md sub-docs into .claude/skills/tdd/) — see the repo README for canonical install instructions.

What a session looks like

A typical session has the red-green-refactor loop running once per vertical slice:

  1. Red. Pick the next slice (smallest end-to-end capability). Write the test through the public interface. Run it. Confirm it fails for the right reason — not a typo or a missing import.
  2. Green. Write the minimum implementation that passes the test. No more.
  3. Refactor. Clean up the implementation while the test stays green. The behaviour-test guarantee earns its keep here: structural changes happen freely because the test doesn’t care about structure.

Repeat for the next slice. The discipline that makes it work: one slice at a time, behaviour through public interfaces, no mock-coupling.

Receipts

TODO — to be filled in from a real session. When the skill is triggered in production use, capture: number of vertical slices completed in one session, whether any test broke during refactor (indicating implementation coupling), and whether the agent attempted horizontal slicing (writing multiple tests first) at any point.

Source and attribution

Originally written by Matt Pocock. The canonical SKILL.md plus the tests.md and mocking.md sub-docs live in the engineering/tdd folder of his public skills repository.

License: MIT. You can install, adapt, and redistribute the skill, with attribution preserved.

This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.