Claude Code hook: Haiku 4.5 judges every content edit
Recipe #4 in the cookbook, after the CWD guard, the skill-activation logger, and the frontmatter guard. The first three are deterministic — regex-shaped checks that pass or fail on plain text. This one is probabilistic: every content edit gets sent to Claude Haiku 4.5 with an editorial-bar rubric, and the verdict (pass / soft-fail / block) is logged to receipts-drafts/judge-log.jsonl alongside the full token-usage payload. The hook catches what the deterministic frontmatter guard can’t see: a paragraph that’s grammatically correct but is paraphrased upstream documentation with no firsthand observation. The post ships the .claude/settings.json block, the Node script with prompt-caching wired in, the model-specific minimum-cacheable-prefix floor table that nobody mentions in tutorials, and four designed-in failure modes. Receipts pending — the bill section will be filled in from a real session, transparent on the live page until then.
The hook: send every content edit to Haiku 4.5 for a verdict
The deterministic frontmatter guard catches structural failure (missing field, malformed date). What it can’t catch: a paragraph that’s grammatically correct but is paraphrased upstream documentation with no firsthand observation, or a “Receipts” section that contains plausible-generic prose instead of real numbers. That’s where this site’s editorial bar lives, and that’s where a regex tops out.
An LLM judge can read those two paragraphs and tell the difference. Haiku 4.5 is the right model for the job — fast, cheap, and accurate enough on a constrained classification task with a strict rubric.
.claude/settings.json — adds the new matcher to the existing PostToolUse block. timeout: 30 because this is a network call, not a regex sweep:
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write|MultiEdit",
"hooks": [
{
"type": "command",
"command": "node .claude/hooks/judge-content.mjs",
"timeout": 30
}
]
}
]
}
}
.claude/hooks/judge-content.mjs:
#!/usr/bin/env node
import { readFileSync, appendFileSync, mkdirSync, existsSync } from 'node:fs';
import { resolve, sep, posix, join } from 'node:path';
import Anthropic from '@anthropic-ai/sdk';
const MODEL = 'claude-haiku-4-5';
const RUBRIC = `You are an editorial-bar judge for a Claude Code blog and skill wiki at agentcookbooks.com.
The site's wedge: every page must contain at least one real artifact generated firsthand. Real artifacts are:
- a config that ran (settings.json, hook script, command line),
- a number that was measured (latency, token count, cost, false-positive rate),
- a failure mode that was hit and how it was resolved (with the actual error, the fix, what would surprise a future reader).
Quote-only paraphrases of upstream documentation fail the bar. Generic listicle prose fails the bar.
You will receive a markdown file. Return a JSON object with these fields:
- verdict: "pass" | "soft-fail" | "block"
- artifact_present: boolean
- artifact_type: "config" | "number" | "failure-mode" | "none"
- reason: string (1-2 sentences, surfaced to the editor as feedback)
Calibration:
- "pass": at least one real artifact and the surrounding prose adds firsthand observation
- "soft-fail": artifact present but prose is thin, OR prose is rich but the artifact is borderline (a generic config with no firsthand context)
- "block": no real artifact AND prose is paraphrase-only
Block sparingly. The editorial bar is "don't ship without a receipt", not "every paragraph must contain a number". Section headings, transitions, and short conclusions are fine.
Respond with the JSON object only. No prose before or after.`;
let raw = '';
process.stdin.on('data', (c) => (raw += c));
process.stdin.on('end', async () => {
let payload;
try { payload = JSON.parse(raw); } catch { process.exit(0); }
if (!['Edit', 'Write', 'MultiEdit'].includes(payload.tool_name)) process.exit(0);
const filePath = payload.tool_input?.file_path;
if (!filePath) process.exit(0);
const norm = filePath.split(sep).join(posix.sep);
const isContent = /\/src\/content\/(blog|skills)\/[^/]+\.mdx?$/.test(norm);
if (!isContent) process.exit(0);
if (!process.env.ANTHROPIC_API_KEY) process.exit(0);
let body;
try { body = readFileSync(resolve(filePath), 'utf8'); } catch { process.exit(0); }
const excerpt = body.slice(0, 8000);
const client = new Anthropic();
try {
const resp = await client.messages.create({
model: MODEL,
max_tokens: 200,
system: [
{ type: 'text', text: RUBRIC, cache_control: { type: 'ephemeral' } },
],
messages: [{ role: 'user', content: `File: ${norm}\n\n${excerpt}` }],
});
const text = resp.content?.[0]?.type === 'text' ? resp.content[0].text : '';
let verdict;
try { verdict = JSON.parse(text); }
catch {
logVerdict(payload.cwd, { ts: now(), file: norm, parse_error: text, usage: resp.usage });
process.exit(0);
}
logVerdict(payload.cwd, {
ts: now(),
file: norm,
verdict: verdict.verdict,
artifact_present: verdict.artifact_present,
artifact_type: verdict.artifact_type,
reason: verdict.reason,
usage: resp.usage,
});
if (verdict.verdict === 'block') {
process.stderr.write(JSON.stringify({
decision: 'block',
reason: `[content-judge] ${verdict.reason}`,
}) + '\n');
process.exit(2);
}
} catch (err) {
process.stderr.write(`[content-judge] ${err.message}\n`);
process.exit(0);
}
});
function now() { return new Date().toISOString(); }
function logVerdict(cwd, entry) {
const dir = join(cwd || process.cwd(), 'receipts-drafts');
if (!existsSync(dir)) mkdirSync(dir, { recursive: true });
appendFileSync(join(dir, 'judge-log.jsonl'), JSON.stringify(entry) + '\n');
}
The script reads the hook payload, exits early on tool calls outside the two content collections, sends the post-edit file body to Haiku 4.5 with the rubric as a cached system prompt, parses the JSON verdict, appends an entry to receipts-drafts/judge-log.jsonl, and on a "block" verdict exits 2 with structured stderr — the same shape the frontmatter guard uses, so Claude reads the reason and self-corrects on the next turn.
Install: npm install @anthropic-ai/sdk. The hook needs ANTHROPIC_API_KEY in the shell that launched Claude Code; if it’s missing the script exits silently (designed-in failure mode #1 below).
The prompt-caching floor that nobody mentions
The cache_control: { type: 'ephemeral' } marker on the system block is the standard pattern for a frozen rubric — write once, read on every subsequent call at ~10% of input cost. Tutorials show the marker; the part they leave out is the minimum-cacheable-prefix floor, which is model-specific:
| Model | Minimum |
|---|---|
| Opus 4.7, Opus 4.6, Opus 4.5, Haiku 4.5 | 4096 tokens |
| Sonnet 4.6 | 2048 tokens |
| Sonnet 4.5 and older | 1024 tokens |
Anything shorter than the floor silently won’t cache — no error, just cache_creation_input_tokens: 0 and cache_read_input_tokens: 0 on every response. The rubric in the script above is ~220 tokens. So on Haiku 4.5, caching does not fire at this rubric size. The marker is a forward-looking hint for the version of this rubric that has fifteen worked examples and crosses the floor.
This is worth knowing before installing the hook: if you read the response usage and see zeros across both cache fields, the marker isn’t broken — the prefix is just under 4096 tokens. Either grow the rubric (worked examples are the natural way) or accept that every call is full-price input on Haiku 4.5.
Receipts (TODO — will be filled in from a real session)
The hook is wired but not yet running for receipts. Notes will replace this section as the hook fires on real edits in a future session. Format the receipts will land in:
Verdict log. receipts-drafts/judge-log.jsonl accumulates one entry per qualified edit. Each entry has ts, file, verdict, artifact_present, artifact_type, reason, and the full usage payload (input_tokens, output_tokens, cache_creation_input_tokens, cache_read_input_tokens). After the first 5–10 verdicts, an excerpt of the JSONL goes here — including at least one pass, one soft-fail, and ideally one block from a deliberately-broken test file.
The bill. Per-edit cost on Haiku 4.5 at $1/M input + $5/M output. Back-of-envelope for a typical content edit: ~2K input tokens (rubric + file excerpt) plus ~80 output tokens of JSON ≈ $0.0024 per edit. The receipt belongs here once usage.input_tokens × pricing has been totalled across a real session, not estimated.
Cache hit rate. Will read 0/0 on Haiku 4.5 until the rubric grows past 4096 tokens — see the floor table above. Once the rubric crosses the threshold (probably after a few worked-example additions), cache_read_input_tokens should dominate input_tokens after the first call in each 5-minute window. Both numbers go here when they’re real.
Designed-in failure modes
The script hasn’t been observed breaking yet, so this is “what was anticipated and what the script does about it” rather than “what broke” — the latter section will land here once the hook has run.
- No
ANTHROPIC_API_KEYin the environment → script exits 0 silently. The harness sees a no-op, the edit proceeds, no log entry. This is intentional — the hook is opt-in via the env var, so unset means off. - Network failure / timeout / rate limit → caught by the outer
try, error written to stderr, exit 0. The edit is never blocked by infrastructure flakiness. - Model returns non-JSON (rubric drift, refusal, truncation at
max_tokens: 200) → JSON parse fails, the raw text is logged underparse_error, exit 0. The edit proceeds; the operator sees the parse error in the log and tightens the rubric. - False-positive block on a legitimate quote-block (the rubric tries to be lenient about this, but it’s the obvious failure pattern) → captured in the verdict log with
verdict: "block", surfaced via stderr to Claude, who can either rewrite the paragraph or — if the verdict is wrong — disable the hook for the next edit. The mitigation if false positives become routine: lower the temperature implicitly via the rubric (more explicit “block sparingly” language), or downgradeblocktosoft-failand remove theexit 2path so verdicts log but never block.
When NOT to use this hook
This is the most expensive hook in the cookbook by a wide margin, so think about the cost before installing.
- One-off scripts and short sessions — the deterministic guards earn their keep at zero marginal cost. The judge only earns its keep on long content sessions where a paraphrase-only paragraph would otherwise slip past until publish.
- Refactor sessions that touch many content files in one go — a 50-file refactor turns into 50 API calls. At ~$0.0024 per call that’s ~$0.12, which is fine, but if the refactor is mechanical (frontmatter renames, link updates) the judge isn’t grading anything substantive. Disable the hook by unsetting
ANTHROPIC_API_KEYfor the refactor session. - When the deterministic frontmatter guard is enough — for structural failures (missing field, bad date) the guard catches it instantly and for free. The judge is for the failure mode the guard can’t see: “the words are valid but they’re paraphrase-only”.
What’s next
Eight more hooks lined up. Each one ships only after a real run on real code — receipts coming for this one as soon as the bill is in. Subscribe to RSS for the rest.