ab-test-setup
A Claude Code skill that helps design statistically valid A/B tests and build a continuous experimentation program — hypotheses, sample size, ICE prioritization, and a winning-pattern playbook.
Design A/B tests that produce trustworthy results
Trigger phrases
Phrases that activate this skill when typed to Claude Code:
set up an A/B testdesign an experimentshould I test thishow long should I run this testbuild an experimentation program
What it does
ab-test-setup is a Claude Code skill from Corey Haines’s marketing-skills repo. It turns Claude into an experimentation lead who insists on a hypothesis, a pre-registered sample size, and a primary metric before you ship a single variant. The skill activates when you mention “A/B test”, “split test”, “should I test this”, or “experiment velocity”, and walks through hypothesis generation, sample-size math, variant design, and analysis.
The output of a session is a documented test plan: hypothesis in “Because X, we believe Y, will cause Z” form, primary and guardrail metrics, sample size per variant, traffic allocation, a pre-launch checklist, and a slot in the experiment playbook for the result.
When to use it
Reach for it when:
- You’re tempted to “just ship it and see what happens” and want a forcing function for rigor
- You’re spinning up a growth program and need a hypothesis backlog with ICE scores
- You have enough traffic that statistical significance is in reach (not 200 visits/month)
When not to reach for it:
- Site has so little traffic that a meaningful test would take six months to call
- The change is reversible and trivial — overhead exceeds value
- You’ve already locked the design and just need ship help
Install
The skill is distributed via Corey Haines’s marketing-skills repo. Install via the repo’s recommended path — copy the ab-test-setup SKILL.md into your project’s .claude/skills/ab-test-setup/ directory, or use the repo’s plugin install if you’ve set it up.
Once installed, the skill activates on the trigger phrases above. The first time it runs, it will check for .agents/product-marketing-context.md (or .claude/product-marketing-context.md) — populating that file with your product context first dramatically improves output quality across all of Haines’s marketing skills.
What a session looks like
A typical session has three phases:
- Hypothesis pressure-test. Claude refuses vague predictions. You’ll be asked for the observation, the change, the predicted outcome, and the audience — until the hypothesis is sharp enough to call.
- Sample size + design. It calculates how many users per variant you need given your baseline rate and minimum detectable effect, then helps you pick a single load-bearing variable to vary.
- Pre-launch checklist + playbook slot. Before the test launches you get a QA checklist; after it ends, the result lands in a structured “Experiment Playbook” entry — pattern, segment deltas, where else to apply.
The discipline that makes it work: pre-commit to sample size, don’t peek, and document the pattern that won — not just the variant. Over time the playbook becomes a library of growth patterns that compound.
Receipts
Honest reporting on what ab-test-setup produces and where it has limits:
Where it works well:
- Catches the “we’ll know it when we see it” problem by forcing a primary metric before launch
- Prevents peeking-induced false positives — the script holds the line on pre-registered sample size
- The ICE scoring loop turns a chaotic ideas list into a ranked backlog
Where it backfires:
- On low-traffic sites the math is brutal — the skill correctly tells you you can’t test, which feels obstructive
- It can over-formalize tiny changes; not every copy tweak needs a hypothesis doc
Pattern that works: trigger on the first test of a new program — the rigor pays off because every later test reuses the doc structure. By test five the overhead is near-zero.
Source and attribution
Originally written by Corey Haines. The canonical SKILL.md and any supporting files live in the ab-test-setup folder of his marketing-skills repository.
License: MIT. You can install, adapt, and redistribute the skill, with attribution preserved.
This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.