ai-regression-testing

A Claude Code skill from Affaan M's everything-claude-code repo that ships sandbox-mode API testing patterns for AI-assisted development — Vitest + Next.js App Router, DB-free test setup, contract assertions, and a /bug-check workflow that runs npm test + npm build before any AI code review. Targets the systematic blind spots where the same model writes and reviews code.

Catch AI-introduced regressions mechanically before the same model reviews its own work

Source Affaan M

License MIT

First documented 2026-05-12

Receipts TODO

Evals

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

set up regression tests for AI-modified code
sandbox mode API test pattern
bug check workflow that runs tests first

What it does

ai-regression-testing is the AI-blind-spot test skill in Affaan M’s everything-claude-code — see skills/ai-regression-testing. It targets a specific failure mode of AI-assisted dev: when the same model writes a fix and then reviews its own work, it carries the same assumptions into both steps and says “looks correct” while the bug persists. The skill ships a four-pattern catalog of AI regressions plus a Vitest + Next.js App Router test setup that runs DB-free.

The four canonical patterns are sandbox/production path mismatch (#1 — observed in 3 of 4 regressions in the upstream case study), SELECT clause omission (common with Supabase / Prisma when adding new columns to a response without updating the query), error state leakage (setting an error message but not clearing the stale data behind it), and optimistic update without rollback (UI optimistically updates, API fails, UI never restores). Each pattern ships a FAIL and PASS code snippet plus a regression test that catches it.

The sandbox-mode setup is the operational core: process.env.SANDBOX_MODE = "true" in __tests__/setup.ts, a createTestRequest helper that builds NextRequest objects with optional sandbox user IDs, a parseResponse helper, and contract-style assertions on REQUIRED_FIELDS arrays. The /bug-check workflow is a custom slash command that runs npm run test + npm run build first, before any AI code review — failures get reported as highest-priority bugs without depending on AI judgment to catch them. The “test where bugs were found” rule keeps coverage organic rather than chasing percentages.

When to use it

An AI agent has modified API routes or backend logic and you want regression coverage before it modifies them again
A bug was found and fixed — write the test before or alongside the fix so the regression can’t recur
Project has a sandbox or mock mode that can be leveraged for fast, DB-free API testing
Multiple code paths exist (sandbox vs production, feature flags, A/B variants) — the parity tests are exactly the gap AI fills incorrectly
Setting up a /bug-check workflow that runs tests mechanically before code review

When not to reach for it:

Greenfield code that has never had a bug — the skill is explicit that 100% coverage is the wrong target
Pure unit-test setup for libraries — the patterns are specifically about API-route contracts
Comparing coding agents — that’s agent-eval
Performance regressions — that’s benchmark

Install

From affaan-m/everything-claude-code at skills/ai-regression-testing/. Drop the folder into ~/.claude/skills/ai-regression-testing/. The skill is patterns and code templates — runtime dependencies are project-side: Vitest, a Next.js App Router (or equivalent) project, and a sandbox/mock mode that the test setup can force on via env var. The bundled /bug-check command definition goes in .claude/commands/bug-check.md.

What a session looks like

Bug surfaces. Field notification_settings missing from the /api/user/profile response. AI fixed it once already and the fix didn’t stick.
Skill picks the pattern. Sandbox/production path mismatch is the most likely culprit — fix landed in one path, not the other.
Set up the test harness. vitest.config.ts + __tests__/setup.ts force sandbox mode + clear the DB env vars. createTestRequest + parseResponse helpers go in __tests__/helpers.ts.
Write the regression test. Asserts REQUIRED_FIELDS (including notification_settings) are all present on the response, plus a named test that explicitly references the bug ID (“BUG-R1 regression”). Test should fail on the current code.
Fix the bug for real. Apply to both sandbox and production paths.
Wire /bug-check. Custom command runs npm run test + npm run build first, only proceeds to AI review if both pass. Failures get reported mechanically.
Iterate. Next bug-check catches if the fix regresses. New bugs add new regression tests. Coverage grows organically.

The discipline that makes it work: tests run before AI review. The skill is explicit that AI self-review carries the same blind spots as the AI write step — if the test suite isn’t a mandatory mechanical gate, the same bug gets re-introduced and re-approved.

Receipts

TODO — to be filled in from a real session. Once the patterns have been applied to a real project, this section will capture: which of the four canonical regression patterns showed up most often in the operator’s codebase (the upstream call-out is sandbox/production mismatch at 3-of-4), whether the /bug-check workflow’s mechanical-first rule actually caught a regression that AI review missed, the per-run test suite cost in seconds at sandbox-mode-DB-free vs. a full integration test, and any pattern that didn’t fit the upstream four (a fifth category to add to the catalog).

Source and attribution

From Affaan M’s everything-claude-code — an MIT-licensed skill collection covering harness construction, agent ops, video, payments, and platform-specific patterns.

License: MIT.

Quoting the AI-blind-spot rule verbatim: “When an AI writes code and then reviews its own work, it carries the same assumptions into both steps.” That’s the wedge — automated tests aren’t general good hygiene here, they’re the specific mechanical gate that breaks the write-review-approve loop the model would otherwise close on itself.