Six personas in parallel, three findings with receipts

2026-05-09

claude-codeskillsagents

Illustrated receipt card summarizing: Six personas in parallel, three findings with receipts

The wiki had eighteen skills shipping with ## Receipts sections that read “TODO — to be filled in from a real session” — a transparency choice over filler, but a section that says “real notes coming” can only stay there so long. To cut the bucket, I dispatched six agency-agent personas in parallel, each scoped to a specific artifact on the live site. Roughly ~390K agent tokens, ~12 minutes wall time, three real findings worth the receipts. The non-obvious part: an agency-agent “persona” dispatched via the Agent tool is not the installed ~/.claude/agents/ file. It’s a general-purpose subagent that read the persona prompt. The wiki receipts have to say so.

What I ran

The setup: an Explore agent first, to count the buckets. 160 skill pages on the wiki at the time. 18 with a ## Receipts section reading “TODO — to be filled in from a real session.” 5 already firsthand. 137 still plausible-generic from earlier bulk imports. Triage said: don’t run all 18. Split into naturally applicable today (six personas with an obvious live target on this site), contrived (skills I’d have to invent a target for), and external setup (skills that need separate environment, e.g. Python and an LLM API key).

The six dispatched personas — all from github.com/msitarzewski/agency-agents, MIT-licensed:

code-reviewer → review of commit be11144 (4 freshly-imported Osmani skill pages)
evidence-collector → HTML evidence pass on /skills/seo/ + /skills/topic/seo/
accessibility-auditor → live homepage + /skills/ against WCAG 2.2 AA
technical-writer → review of receipts-drafts/_TEMPLATE.md
agents-orchestrator → meta-review of this exact dispatch flow
codebase-onboarding-engineer → cold-read of the repo

Three more skills got receipts the same session, but those I ran myself (against the working tree, the audit itself, and the recent commit history) rather than dispatching a persona.

The trigger phrase for each was explicit: “Read the agency-agents persona file at this URL. Take that as your role. Audit X. Stay in lane — report findings, don’t fix anything.” Then a tight scope: one URL, one commit, one file.

What happened

Three findings worth the receipts.

Finding 1 — code-reviewer caught a frontmatter gap on the most recent commit. The four Addy Osmani skill pages added in be11144 were missing receiptsStatus: "todo" on every frontmatter block. The schema at src/content.config.ts:31 defines the field as z.enum(['firsthand','todo','generic']) and optional. Without it, the index table renders a — dash instead of the visible-TODO chip the transparency policy ships. Caught and fixed in the same commit that consolidated all nine receipts. Persona output, verbatim:

Important #1: missing receiptsStatus: "todo" on all 4 frontmatter blocks. Schema in src/content.config.ts:31 defines receiptsStatus: z.enum(['firsthand','todo','generic']) as optional; without it, these 4 files land in the — column on the index instead of the visible TODO bucket the visible-TODO transparency policy is designed to surface.

That’s the kind of finding that sits in the gap between “ships and renders” and “ships as designed.” A unit test wouldn’t catch it; a frontmatter-validation rule could but doesn’t exist; a human reviewing four similar files at once tends to skim. The persona caught it on the second file and verified across all four.

Finding 2 — evidence-collector flagged the canonical slug buried in alphabetical order. On /skills/topic/seo/, 28 skill cards in alphabetical order put the canonical slug seo at position 4 (after ai-seo, programmatic-seo, schema-markup). The hand-written lede at the top names five anchor skills — seo-audit, ai-seo, schema-markup, programmatic-seo, seo-cluster — but not the canonical seo itself. Among 25+ seo-* siblings, the broadest entry isn’t elevated. The persona also flagged a schema asymmetry: skill detail pages emit BreadcrumbList, topic pages don’t.

Both findings are open work. The first is a sort-key fix; the second is a layout edit. Neither blocks ship.

Finding 3 — accessibility-auditor found three real WCAG AA gaps on the live site. Skip link missing (<a href="#main"> should be the first focusable element; the id="main" was there, the link wasn’t). Pagefind search input lacking a server-rendered accessible name (the SSR shell is just <div id="ac-pagefind-search">; the <input> is JS-injected post-mount, and Pagefind UI 1.5.x sets aria-label="Search" by default but the SSR markup couldn’t be verified). Cloudflare email-obfuscation footer link reading literally “email protected” to screen readers — that one is invisible from a source-code audit because the obfuscation happens edge-side, not in the Astro build.

Two of the three landed in the same 2a5a1d7 commit: skip link in src/layouts/Base.astro, aria-label="Email Agent Cookbooks" on the footer link. Pagefind verification deferred (it needs DevTools post-deploy).

Where it drifted

The honesty footnote is the part the wiki receipts had to be written carefully around.

An agency-agent “persona” file is a system prompt in a markdown frontmatter block, designed to be installed at ~/.claude/agents/<name>.md and selected via the Agent tool’s subagent_type parameter. When you dispatch via Claude Code without that file installed, what runs is a general-purpose subagent given the persona’s text as part of its prompt. Same model, same tools, but: no subagent_type lookup, no description-based dispatch matching, no tools field enforcement from the agent definition. The behavior approximates the persona; it isn’t the persona.

This matters for the wiki receipts because every page above is framed as “persona dispatched on X.” Read literally, that implies the installed-agent path. The receipts now say “persona-emulation dispatch — general-purpose agent given the persona prompt, not the installed agent file.” Three lines longer; honest.

The receipts that got promoted to firsthand on the wiki call this out explicitly in their Receipts sections. Future runs of this kind of dispatch should name the emulation up front — what’s running is a general-purpose subagent reading the persona prompt, not the persona’s installed agent file.

The other thing that drifted: the cost. ~390K agent tokens for six dispatches isn’t free. If the goal had been “generic plausible content for the chip bucket,” this would be the wrong way to spend the budget. The justification was that each finding above is specifically true about this site — a frontmatter gap on a specific commit, a slug buried in a specific topic page, three a11y gaps on specific URLs — not a generic “what a code reviewer would say.” Generic findings are cheap. Specific findings, run in parallel, are the price you pay for scaling firsthand audits past one human.

What I’d change

Three things.

Don’t dispatch personas to fill TODO buckets. Dispatch them to find a specific thing on a specific artifact. The bucket-filling framing is the trap — it leads to plausible-generic receipts that read confident but wouldn’t grep-verify. The framing that worked: “audit commit X for frontmatter gaps.” The framing that wouldn’t have: “give me a code review for the wiki.”

Inherit the evidence rubric in the prompt. Each persona dispatch named the artifact (URL, commit hash, file path) and required findings to cite specifics that could be re-verified. Without that, semantic classifications drift toward judgment calls that don’t ground out. The dispatched agent inherits whatever evidence rubric the prompt makes explicit; if the prompt doesn’t carry it, the agent fills the gap with judgment.

Cap the parallel batch. Six was the right number for this site at this state — three personas would have under-used the parallel cost; ten would have over-spent. The triage step (run all 18? no — six naturally-applicable) was the actual decision; the parallel dispatch was the mechanical part. Push back on “run them all” before it becomes a habit. The same triage shape showed up earlier in Four SEO skills on one homepage: only Sweep 4 found the gap — running multiple lenses on one target only pays when each lens is differentiated, not when they overlap.

Six personas in parallel, three findings with receipts

What I ran

What happened

Where it drifted

What I’d change

Related skills (by topic)

Related posts (by topic)