ab-test-setup

A Claude Code skill that helps design statistically valid A/B tests and build a continuous experimentation program — hypotheses, sample size, ICE prioritization, and a winning-pattern playbook.

Design A/B tests that produce trustworthy results

Source Corey Haines
License MIT
First documented

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

  • set up an A/B test
  • design an experiment
  • should I test this
  • how long should I run this test
  • build an experimentation program

What it does

ab-test-setup is a Claude Code skill from Corey Haines’s marketing-skills repo. It turns Claude into an experimentation lead who insists on a hypothesis, a pre-registered sample size, and a primary metric before you ship a single variant. The skill activates when you mention “A/B test”, “split test”, “should I test this”, or “experiment velocity”, and walks through hypothesis generation, sample-size math, variant design, and analysis.

The output of a session is a documented test plan: hypothesis in “Because X, we believe Y, will cause Z” form, primary and guardrail metrics, sample size per variant, traffic allocation, a pre-launch checklist, and a slot in the experiment playbook for the result.

When to use it

Reach for it when:

  • You’re tempted to “just ship it and see what happens” and want a forcing function for rigor
  • You’re spinning up a growth program and need a hypothesis backlog with ICE scores
  • You have enough traffic that statistical significance is in reach (not 200 visits/month)

When not to reach for it:

  • Site has so little traffic that a meaningful test would take six months to call
  • The change is reversible and trivial — overhead exceeds value
  • You’ve already locked the design and just need ship help

Install

The skill is distributed via Corey Haines’s marketing-skills repo. Install via the repo’s recommended path — copy the ab-test-setup SKILL.md into your project’s .claude/skills/ab-test-setup/ directory, or use the repo’s plugin install if you’ve set it up.

Once installed, the skill activates on the trigger phrases above. The first time it runs, it will check for .agents/product-marketing-context.md (or .claude/product-marketing-context.md) — populating that file with your product context first dramatically improves output quality across all of Haines’s marketing skills.

What a session looks like

A typical session has three phases:

  1. Hypothesis pressure-test. Claude refuses vague predictions. You’ll be asked for the observation, the change, the predicted outcome, and the audience — until the hypothesis is sharp enough to call.
  2. Sample size + design. It calculates how many users per variant you need given your baseline rate and minimum detectable effect, then helps you pick a single load-bearing variable to vary.
  3. Pre-launch checklist + playbook slot. Before the test launches you get a QA checklist; after it ends, the result lands in a structured “Experiment Playbook” entry — pattern, segment deltas, where else to apply.

The discipline that makes it work: pre-commit to sample size, don’t peek, and document the pattern that won — not just the variant. Over time the playbook becomes a library of growth patterns that compound.

Receipts

Honest reporting on what ab-test-setup produces and where it has limits:

Where it works well:

  • Catches the “we’ll know it when we see it” problem by forcing a primary metric before launch
  • Prevents peeking-induced false positives — the script holds the line on pre-registered sample size
  • The ICE scoring loop turns a chaotic ideas list into a ranked backlog

Where it backfires:

  • On low-traffic sites the math is brutal — the skill correctly tells you you can’t test, which feels obstructive
  • It can over-formalize tiny changes; not every copy tweak needs a hypothesis doc

Pattern that works: trigger on the first test of a new program — the rigor pays off because every later test reuses the doc structure. By test five the overhead is near-zero.

Source and attribution

Originally written by Corey Haines. The canonical SKILL.md and any supporting files live in the ab-test-setup folder of his marketing-skills repository.

License: MIT. You can install, adapt, and redistribute the skill, with attribution preserved.

This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.