Claude Code hook: Haiku 4.5 judges every content edit

2026-04-30

claude-codehookscookbook

Recipe #4 in the cookbook, after the CWD guard, the skill-activation logger, and the frontmatter guard. The first three are deterministic — regex-shaped checks that pass or fail on plain text. This one is probabilistic: every content edit gets sent to Claude Haiku 4.5 with an editorial-bar rubric, and the verdict (pass / soft-fail / block) is logged to receipts-drafts/judge-log.jsonl alongside the full token-usage payload. The hook catches what the deterministic frontmatter guard can’t see: a paragraph that’s grammatically correct but is paraphrased upstream documentation with no firsthand observation. The post ships the .claude/settings.json block, the Node script with prompt-caching wired in, the model-specific minimum-cacheable-prefix floor table that nobody mentions in tutorials, and four designed-in failure modes. Receipts pending — the bill section will be filled in from a real session, transparent on the live page until then.

The hook: send every content edit to Haiku 4.5 for a verdict

The deterministic frontmatter guard catches structural failure (missing field, malformed date). What it can’t catch: a paragraph that’s grammatically correct but is paraphrased upstream documentation with no firsthand observation, or a “Receipts” section that contains plausible-generic prose instead of real numbers. That’s where this site’s editorial bar lives, and that’s where a regex tops out.

An LLM judge can read those two paragraphs and tell the difference. Haiku 4.5 is the right model for the job — fast, cheap, and accurate enough on a constrained classification task with a strict rubric.

.claude/settings.json — adds the new matcher to the existing PostToolUse block. timeout: 30 because this is a network call, not a regex sweep:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write|MultiEdit",
        "hooks": [
          {
            "type": "command",
            "command": "node .claude/hooks/judge-content.mjs",
            "timeout": 30
          }
        ]
      }
    ]
  }
}

.claude/hooks/judge-content.mjs:

#!/usr/bin/env node
import { readFileSync, appendFileSync, mkdirSync, existsSync } from 'node:fs';
import { resolve, sep, posix, join } from 'node:path';
import Anthropic from '@anthropic-ai/sdk';

const MODEL = 'claude-haiku-4-5';

const RUBRIC = `You are an editorial-bar judge for a Claude Code blog and skill wiki at agentcookbooks.com.

The site's wedge: every page must contain at least one real artifact generated firsthand. Real artifacts are:
- a config that ran (settings.json, hook script, command line),
- a number that was measured (latency, token count, cost, false-positive rate),
- a failure mode that was hit and how it was resolved (with the actual error, the fix, what would surprise a future reader).

Quote-only paraphrases of upstream documentation fail the bar. Generic listicle prose fails the bar.

You will receive a markdown file. Return a JSON object with these fields:
- verdict: "pass" | "soft-fail" | "block"
- artifact_present: boolean
- artifact_type: "config" | "number" | "failure-mode" | "none"
- reason: string (1-2 sentences, surfaced to the editor as feedback)

Calibration:
- "pass": at least one real artifact and the surrounding prose adds firsthand observation
- "soft-fail": artifact present but prose is thin, OR prose is rich but the artifact is borderline (a generic config with no firsthand context)
- "block": no real artifact AND prose is paraphrase-only

Block sparingly. The editorial bar is "don't ship without a receipt", not "every paragraph must contain a number". Section headings, transitions, and short conclusions are fine.

Respond with the JSON object only. No prose before or after.`;

let raw = '';
process.stdin.on('data', (c) => (raw += c));
process.stdin.on('end', async () => {
  let payload;
  try { payload = JSON.parse(raw); } catch { process.exit(0); }

  if (!['Edit', 'Write', 'MultiEdit'].includes(payload.tool_name)) process.exit(0);

  const filePath = payload.tool_input?.file_path;
  if (!filePath) process.exit(0);

  const norm = filePath.split(sep).join(posix.sep);
  const isContent = /\/src\/content\/(blog|skills)\/[^/]+\.mdx?$/.test(norm);
  if (!isContent) process.exit(0);

  if (!process.env.ANTHROPIC_API_KEY) process.exit(0);

  let body;
  try { body = readFileSync(resolve(filePath), 'utf8'); } catch { process.exit(0); }

  const excerpt = body.slice(0, 8000);
  const client = new Anthropic();

  try {
    const resp = await client.messages.create({
      model: MODEL,
      max_tokens: 200,
      system: [
        { type: 'text', text: RUBRIC, cache_control: { type: 'ephemeral' } },
      ],
      messages: [{ role: 'user', content: `File: ${norm}\n\n${excerpt}` }],
    });

    const text = resp.content?.[0]?.type === 'text' ? resp.content[0].text : '';

    let verdict;
    try { verdict = JSON.parse(text); }
    catch {
      logVerdict(payload.cwd, { ts: now(), file: norm, parse_error: text, usage: resp.usage });
      process.exit(0);
    }

    logVerdict(payload.cwd, {
      ts: now(),
      file: norm,
      verdict: verdict.verdict,
      artifact_present: verdict.artifact_present,
      artifact_type: verdict.artifact_type,
      reason: verdict.reason,
      usage: resp.usage,
    });

    if (verdict.verdict === 'block') {
      process.stderr.write(JSON.stringify({
        decision: 'block',
        reason: `[content-judge] ${verdict.reason}`,
      }) + '\n');
      process.exit(2);
    }
  } catch (err) {
    process.stderr.write(`[content-judge] ${err.message}\n`);
    process.exit(0);
  }
});

function now() { return new Date().toISOString(); }

function logVerdict(cwd, entry) {
  const dir = join(cwd || process.cwd(), 'receipts-drafts');
  if (!existsSync(dir)) mkdirSync(dir, { recursive: true });
  appendFileSync(join(dir, 'judge-log.jsonl'), JSON.stringify(entry) + '\n');
}

The script reads the hook payload, exits early on tool calls outside the two content collections, sends the post-edit file body to Haiku 4.5 with the rubric as a cached system prompt, parses the JSON verdict, appends an entry to receipts-drafts/judge-log.jsonl, and on a "block" verdict exits 2 with structured stderr — the same shape the frontmatter guard uses, so Claude reads the reason and self-corrects on the next turn.

Install: npm install @anthropic-ai/sdk. The hook needs ANTHROPIC_API_KEY in the shell that launched Claude Code; if it’s missing the script exits silently (designed-in failure mode #1 below).

The prompt-caching floor that nobody mentions

The cache_control: { type: 'ephemeral' } marker on the system block is the standard pattern for a frozen rubric — write once, read on every subsequent call at ~10% of input cost. Tutorials show the marker; the part they leave out is the minimum-cacheable-prefix floor, which is model-specific:

Model	Minimum
Opus 4.7, Opus 4.6, Opus 4.5, Haiku 4.5	4096 tokens
Sonnet 4.6	2048 tokens
Sonnet 4.5 and older	1024 tokens

Anything shorter than the floor silently won’t cache — no error, just cache_creation_input_tokens: 0 and cache_read_input_tokens: 0 on every response. The rubric in the script above is ~220 tokens. So on Haiku 4.5, caching does not fire at this rubric size. The marker is a forward-looking hint for the version of this rubric that has fifteen worked examples and crosses the floor.

This is worth knowing before installing the hook: if you read the response usage and see zeros across both cache fields, the marker isn’t broken — the prefix is just under 4096 tokens. Either grow the rubric (worked examples are the natural way) or accept that every call is full-price input on Haiku 4.5.

Receipts (TODO — will be filled in from a real session)

The hook is wired but not yet running for receipts. Notes will replace this section as the hook fires on real edits in a future session. Format the receipts will land in:

Verdict log. receipts-drafts/judge-log.jsonl accumulates one entry per qualified edit. Each entry has ts, file, verdict, artifact_present, artifact_type, reason, and the full usage payload (input_tokens, output_tokens, cache_creation_input_tokens, cache_read_input_tokens). After the first 5–10 verdicts, an excerpt of the JSONL goes here — including at least one pass, one soft-fail, and ideally one block from a deliberately-broken test file.

The bill. Per-edit cost on Haiku 4.5 at $1/M input + $5/M output. Back-of-envelope for a typical content edit: ~2K input tokens (rubric + file excerpt) plus ~80 output tokens of JSON ≈ $0.0024 per edit. The receipt belongs here once usage.input_tokens × pricing has been totalled across a real session, not estimated.

Cache hit rate. Will read 0/0 on Haiku 4.5 until the rubric grows past 4096 tokens — see the floor table above. Once the rubric crosses the threshold (probably after a few worked-example additions), cache_read_input_tokens should dominate input_tokens after the first call in each 5-minute window. Both numbers go here when they’re real.

Designed-in failure modes

The script hasn’t been observed breaking yet, so this is “what was anticipated and what the script does about it” rather than “what broke” — the latter section will land here once the hook has run.

No ANTHROPIC_API_KEY in the environment → script exits 0 silently. The harness sees a no-op, the edit proceeds, no log entry. This is intentional — the hook is opt-in via the env var, so unset means off.
Network failure / timeout / rate limit → caught by the outer try, error written to stderr, exit 0. The edit is never blocked by infrastructure flakiness.
Model returns non-JSON (rubric drift, refusal, truncation at max_tokens: 200) → JSON parse fails, the raw text is logged under parse_error, exit 0. The edit proceeds; the operator sees the parse error in the log and tightens the rubric.
False-positive block on a legitimate quote-block (the rubric tries to be lenient about this, but it’s the obvious failure pattern) → captured in the verdict log with verdict: "block", surfaced via stderr to Claude, who can either rewrite the paragraph or — if the verdict is wrong — disable the hook for the next edit. The mitigation if false positives become routine: lower the temperature implicitly via the rubric (more explicit “block sparingly” language), or downgrade block to soft-fail and remove the exit 2 path so verdicts log but never block.

When NOT to use this hook

This is the most expensive hook in the cookbook by a wide margin, so think about the cost before installing.

One-off scripts and short sessions — the deterministic guards earn their keep at zero marginal cost. The judge only earns its keep on long content sessions where a paraphrase-only paragraph would otherwise slip past until publish.
Refactor sessions that touch many content files in one go — a 50-file refactor turns into 50 API calls. At ~$0.0024 per call that’s ~$0.12, which is fine, but if the refactor is mechanical (frontmatter renames, link updates) the judge isn’t grading anything substantive. Disable the hook by unsetting ANTHROPIC_API_KEY for the refactor session.
When the deterministic frontmatter guard is enough — for structural failures (missing field, bad date) the guard catches it instantly and for free. The judge is for the failure mode the guard can’t see: “the words are valid but they’re paraphrase-only”.

What’s next

Eight more hooks lined up. Each one ships only after a real run on real code — receipts coming for this one as soon as the bill is in. Subscribe to RSS for the rest.