# ab-test-setup

> A Claude Code skill that helps design statistically valid A/B tests and build a continuous experimentation program — hypotheses, sample size, ICE prioritization, and a winning-pattern playbook.

**Use case**: Design A/B tests that produce trustworthy results

**Canonical URL**: https://agentcookbooks.com/skills/ab-test-setup/

**Topics**: claude-code, skills, marketing, ab-testing

**Trigger phrases**: "set up an A/B test", "design an experiment", "should I test this", "how long should I run this test", "build an experimentation program"

**Source**: [Corey Haines](https://github.com/coreyhaines31/marketingskills/tree/main/skills/ab-test-setup)

**License**: MIT

---

## What it does

`ab-test-setup` is a Claude Code skill from Corey Haines's [marketing-skills repo](https://github.com/coreyhaines31/marketingskills). It turns Claude into an experimentation lead who insists on a hypothesis, a pre-registered sample size, and a primary metric before you ship a single variant. The skill activates when you mention "A/B test", "split test", "should I test this", or "experiment velocity", and walks through hypothesis generation, sample-size math, variant design, and analysis.

The output of a session is a documented test plan: hypothesis in "Because X, we believe Y, will cause Z" form, primary and guardrail metrics, sample size per variant, traffic allocation, a pre-launch checklist, and a slot in the experiment playbook for the result.

## When to use it

Reach for it when:

- You're tempted to "just ship it and see what happens" and want a forcing function for rigor
- You're spinning up a growth program and need a hypothesis backlog with ICE scores
- You have enough traffic that statistical significance is in reach (not 200 visits/month)

When *not* to reach for it:

- Site has so little traffic that a meaningful test would take six months to call
- The change is reversible and trivial — overhead exceeds value
- You've already locked the design and just need ship help

## Install

The skill is distributed via Corey Haines's [marketing-skills repo](https://github.com/coreyhaines31/marketingskills). Install via the repo's recommended path — copy the [`ab-test-setup` SKILL.md](https://github.com/coreyhaines31/marketingskills/tree/main/skills/ab-test-setup) into your project's `.claude/skills/ab-test-setup/` directory, or use the repo's plugin install if you've set it up.

Once installed, the skill activates on the trigger phrases above. The first time it runs, it will check for `.agents/product-marketing-context.md` (or `.claude/product-marketing-context.md`) — populating that file with your product context first dramatically improves output quality across all of Haines's marketing skills.

## What a session looks like

A typical session has three phases:

1. **Hypothesis pressure-test.** Claude refuses vague predictions. You'll be asked for the observation, the change, the predicted outcome, and the audience — until the hypothesis is sharp enough to call.
2. **Sample size + design.** It calculates how many users per variant you need given your baseline rate and minimum detectable effect, then helps you pick a single load-bearing variable to vary.
3. **Pre-launch checklist + playbook slot.** Before the test launches you get a QA checklist; after it ends, the result lands in a structured "Experiment Playbook" entry — pattern, segment deltas, where else to apply.

The discipline that makes it work: pre-commit to sample size, don't peek, and document the *pattern* that won — not just the variant. Over time the playbook becomes a library of growth patterns that compound.

## Receipts

Honest reporting on what `ab-test-setup` produces and where it has limits:

**Where it works well:**
- Catches the "we'll know it when we see it" problem by forcing a primary metric before launch
- Prevents peeking-induced false positives — the script holds the line on pre-registered sample size
- The ICE scoring loop turns a chaotic ideas list into a ranked backlog

**Where it backfires:**
- On low-traffic sites the math is brutal — the skill correctly tells you you can't test, which feels obstructive
- It can over-formalize tiny changes; not every copy tweak needs a hypothesis doc

**Pattern that works:** trigger on the first test of a new program — the rigor pays off because every later test reuses the doc structure. By test five the overhead is near-zero.

## Source and attribution

Originally written by [Corey Haines](https://corey.co). The canonical SKILL.md and any supporting files live in the [`ab-test-setup` folder](https://github.com/coreyhaines31/marketingskills/tree/main/skills/ab-test-setup) of his [marketing-skills repository](https://github.com/coreyhaines31/marketingskills).

License: MIT. You can install, adapt, and redistribute the skill, with attribution preserved.

This page documents the skill from a practitioner's perspective. For the formal spec and any updates, defer to the source repo.