# ai-regression-testing

> A Claude Code skill from Affaan M's everything-claude-code repo that ships sandbox-mode API testing patterns for AI-assisted development — Vitest + Next.js App Router, DB-free test setup, contract assertions, and a /bug-check workflow that runs npm test + npm build before any AI code review. Targets the systematic blind spots where the same model writes and reviews code.

**Use case**: Catch AI-introduced regressions mechanically before the same model reviews its own work

**Canonical URL**: https://agentcookbooks.com/skills/ai-regression-testing/

**Topics**: claude-code, skills, evals

**Trigger phrases**: "set up regression tests for AI-modified code", "sandbox mode API test pattern", "bug check workflow that runs tests first"

**Source**: [Affaan M](https://github.com/affaan-m/everything-claude-code/tree/main/skills/ai-regression-testing)

**License**: MIT

---

## What it does

`ai-regression-testing` is the AI-blind-spot test skill in [Affaan M's everything-claude-code](https://github.com/affaan-m/everything-claude-code) — see [skills/ai-regression-testing](https://github.com/affaan-m/everything-claude-code/tree/main/skills/ai-regression-testing). It targets a specific failure mode of AI-assisted dev: when the same model writes a fix and then reviews its own work, it carries the same assumptions into both steps and says "looks correct" while the bug persists. The skill ships a four-pattern catalog of AI regressions plus a Vitest + Next.js App Router test setup that runs DB-free.

The four canonical patterns are sandbox/production path mismatch (#1 — observed in 3 of 4 regressions in the upstream case study), SELECT clause omission (common with Supabase / Prisma when adding new columns to a response without updating the query), error state leakage (setting an error message but not clearing the stale data behind it), and optimistic update without rollback (UI optimistically updates, API fails, UI never restores). Each pattern ships a FAIL and PASS code snippet plus a regression test that catches it.

The sandbox-mode setup is the operational core: `process.env.SANDBOX_MODE = "true"` in `__tests__/setup.ts`, a `createTestRequest` helper that builds `NextRequest` objects with optional sandbox user IDs, a `parseResponse` helper, and contract-style assertions on `REQUIRED_FIELDS` arrays. The `/bug-check` workflow is a custom slash command that runs `npm run test` + `npm run build` *first*, before any AI code review — failures get reported as highest-priority bugs without depending on AI judgment to catch them. The "test where bugs were found" rule keeps coverage organic rather than chasing percentages.

## When to use it

- An AI agent has modified API routes or backend logic and you want regression coverage before it modifies them again
- A bug was found and fixed — write the test before or alongside the fix so the regression can't recur
- Project has a sandbox or mock mode that can be leveraged for fast, DB-free API testing
- Multiple code paths exist (sandbox vs production, feature flags, A/B variants) — the parity tests are exactly the gap AI fills incorrectly
- Setting up a `/bug-check` workflow that runs tests mechanically before code review

When *not* to reach for it:

- Greenfield code that has never had a bug — the skill is explicit that 100% coverage is the wrong target
- Pure unit-test setup for libraries — the patterns are specifically about API-route contracts
- Comparing coding agents — that's `agent-eval`
- Performance regressions — that's `benchmark`

## Install

From [affaan-m/everything-claude-code](https://github.com/affaan-m/everything-claude-code) at `skills/ai-regression-testing/`. Drop the folder into `~/.claude/skills/ai-regression-testing/`. The skill is patterns and code templates — runtime dependencies are project-side: Vitest, a Next.js App Router (or equivalent) project, and a sandbox/mock mode that the test setup can force on via env var. The bundled `/bug-check` command definition goes in `.claude/commands/bug-check.md`.

## What a session looks like

1. **Bug surfaces.** Field `notification_settings` missing from the `/api/user/profile` response. AI fixed it once already and the fix didn't stick.
2. **Skill picks the pattern.** Sandbox/production path mismatch is the most likely culprit — fix landed in one path, not the other.
3. **Set up the test harness.** `vitest.config.ts` + `__tests__/setup.ts` force sandbox mode + clear the DB env vars. `createTestRequest` + `parseResponse` helpers go in `__tests__/helpers.ts`.
4. **Write the regression test.** Asserts `REQUIRED_FIELDS` (including `notification_settings`) are all present on the response, plus a named test that explicitly references the bug ID ("BUG-R1 regression"). Test should fail on the current code.
5. **Fix the bug for real.** Apply to both sandbox and production paths.
6. **Wire `/bug-check`.** Custom command runs `npm run test` + `npm run build` first, only proceeds to AI review if both pass. Failures get reported mechanically.
7. **Iterate.** Next bug-check catches if the fix regresses. New bugs add new regression tests. Coverage grows organically.

The discipline that makes it work: tests run *before* AI review. The skill is explicit that AI self-review carries the same blind spots as the AI write step — if the test suite isn't a mandatory mechanical gate, the same bug gets re-introduced and re-approved.

## Receipts

_TODO — to be filled in from a real session. Once the patterns have been applied to a real project, this section will capture: which of the four canonical regression patterns showed up most often in the operator's codebase (the upstream call-out is sandbox/production mismatch at 3-of-4), whether the `/bug-check` workflow's mechanical-first rule actually caught a regression that AI review missed, the per-run test suite cost in seconds at sandbox-mode-DB-free vs. a full integration test, and any pattern that didn't fit the upstream four (a fifth category to add to the catalog)._

## Source and attribution

From [Affaan M's everything-claude-code](https://github.com/affaan-m/everything-claude-code/tree/main/skills/ai-regression-testing) — an MIT-licensed skill collection covering harness construction, agent ops, video, payments, and platform-specific patterns.

License: MIT.

Quoting the AI-blind-spot rule verbatim: *"When an AI writes code and then reviews its own work, it carries the same assumptions into both steps."* That's the wedge — automated tests aren't general good hygiene here, they're the specific mechanical gate that breaks the write-review-approve loop the model would otherwise close on itself.