# Claude Code hook: Haiku 4.5 judges every content edit

> A PostToolUse hook that sends every content edit to Haiku 4.5 for an editorial-bar verdict. The config, the rubric, and the prompt-caching floor.

**Canonical URL**: https://agentcookbooks.com/blog/claude-code-content-judge-hook/

**Published**: 2026-04-30

**Tags**: claude-code, hooks, cookbook

---

Recipe #4 in the cookbook, after the [CWD guard](/blog/claude-code-hooks-cookbook/), the [skill-activation logger](/blog/claude-code-skill-activation-hook/), and the [frontmatter guard](/blog/claude-code-frontmatter-guard-hook/). The first three are deterministic — regex-shaped checks that pass or fail on plain text. This one is probabilistic: every content edit gets sent to Claude Haiku 4.5 with an editorial-bar rubric, and the verdict (`pass` / `soft-fail` / `block`) is logged to `receipts-drafts/judge-log.jsonl` alongside the full token-usage payload. The hook catches what the deterministic frontmatter guard can't see: a paragraph that's grammatically correct but is paraphrased upstream documentation with no firsthand observation. The post ships the `.claude/settings.json` block, the Node script with prompt-caching wired in, the model-specific minimum-cacheable-prefix floor table that nobody mentions in tutorials, and four designed-in failure modes. Receipts pending — the bill section will be filled in from a real session, transparent on the live page until then.

## The hook: send every content edit to Haiku 4.5 for a verdict

The deterministic frontmatter guard catches structural failure (missing field, malformed date). What it can't catch: a paragraph that's grammatically correct but is paraphrased upstream documentation with no firsthand observation, or a "Receipts" section that contains plausible-generic prose instead of real numbers. That's where this site's editorial bar lives, and that's where a regex tops out.

An LLM judge can read those two paragraphs and tell the difference. Haiku 4.5 is the right model for the job — fast, cheap, and accurate enough on a constrained classification task with a strict rubric.

`.claude/settings.json` — adds the new matcher to the existing PostToolUse block. `timeout: 30` because this is a network call, not a regex sweep:

```json
{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write|MultiEdit",
        "hooks": [
          {
            "type": "command",
            "command": "node .claude/hooks/judge-content.mjs",
            "timeout": 30
          }
        ]
      }
    ]
  }
}
```

`.claude/hooks/judge-content.mjs`:

```js
#!/usr/bin/env node
import { readFileSync, appendFileSync, mkdirSync, existsSync } from 'node:fs';
import { resolve, sep, posix, join } from 'node:path';
import Anthropic from '@anthropic-ai/sdk';

const MODEL = 'claude-haiku-4-5';

const RUBRIC = `You are an editorial-bar judge for a Claude Code blog and skill wiki at agentcookbooks.com.

The site's wedge: every page must contain at least one real artifact generated firsthand. Real artifacts are:
- a config that ran (settings.json, hook script, command line),
- a number that was measured (latency, token count, cost, false-positive rate),
- a failure mode that was hit and how it was resolved (with the actual error, the fix, what would surprise a future reader).

Quote-only paraphrases of upstream documentation fail the bar. Generic listicle prose fails the bar.

You will receive a markdown file. Return a JSON object with these fields:
- verdict: "pass" | "soft-fail" | "block"
- artifact_present: boolean
- artifact_type: "config" | "number" | "failure-mode" | "none"
- reason: string (1-2 sentences, surfaced to the editor as feedback)

Calibration:
- "pass": at least one real artifact and the surrounding prose adds firsthand observation
- "soft-fail": artifact present but prose is thin, OR prose is rich but the artifact is borderline (a generic config with no firsthand context)
- "block": no real artifact AND prose is paraphrase-only

Block sparingly. The editorial bar is "don't ship without a receipt", not "every paragraph must contain a number". Section headings, transitions, and short conclusions are fine.

Respond with the JSON object only. No prose before or after.`;

let raw = '';
process.stdin.on('data', (c) => (raw += c));
process.stdin.on('end', async () => {
  let payload;
  try { payload = JSON.parse(raw); } catch { process.exit(0); }

  if (!['Edit', 'Write', 'MultiEdit'].includes(payload.tool_name)) process.exit(0);

  const filePath = payload.tool_input?.file_path;
  if (!filePath) process.exit(0);

  const norm = filePath.split(sep).join(posix.sep);
  const isContent = /\/src\/content\/(blog|skills)\/[^/]+\.mdx?$/.test(norm);
  if (!isContent) process.exit(0);

  if (!process.env.ANTHROPIC_API_KEY) process.exit(0);

  let body;
  try { body = readFileSync(resolve(filePath), 'utf8'); } catch { process.exit(0); }

  const excerpt = body.slice(0, 8000);
  const client = new Anthropic();

  try {
    const resp = await client.messages.create({
      model: MODEL,
      max_tokens: 200,
      system: [
        { type: 'text', text: RUBRIC, cache_control: { type: 'ephemeral' } },
      ],
      messages: [{ role: 'user', content: `File: ${norm}\n\n${excerpt}` }],
    });

    const text = resp.content?.[0]?.type === 'text' ? resp.content[0].text : '';

    let verdict;
    try { verdict = JSON.parse(text); }
    catch {
      logVerdict(payload.cwd, { ts: now(), file: norm, parse_error: text, usage: resp.usage });
      process.exit(0);
    }

    logVerdict(payload.cwd, {
      ts: now(),
      file: norm,
      verdict: verdict.verdict,
      artifact_present: verdict.artifact_present,
      artifact_type: verdict.artifact_type,
      reason: verdict.reason,
      usage: resp.usage,
    });

    if (verdict.verdict === 'block') {
      process.stderr.write(JSON.stringify({
        decision: 'block',
        reason: `[content-judge] ${verdict.reason}`,
      }) + '\n');
      process.exit(2);
    }
  } catch (err) {
    process.stderr.write(`[content-judge] ${err.message}\n`);
    process.exit(0);
  }
});

function now() { return new Date().toISOString(); }

function logVerdict(cwd, entry) {
  const dir = join(cwd || process.cwd(), 'receipts-drafts');
  if (!existsSync(dir)) mkdirSync(dir, { recursive: true });
  appendFileSync(join(dir, 'judge-log.jsonl'), JSON.stringify(entry) + '\n');
}
```

The script reads the hook payload, exits early on tool calls outside the two content collections, sends the post-edit file body to Haiku 4.5 with the rubric as a cached system prompt, parses the JSON verdict, appends an entry to `receipts-drafts/judge-log.jsonl`, and on a `"block"` verdict exits `2` with structured stderr — the same shape the frontmatter guard uses, so Claude reads the reason and self-corrects on the next turn.

Install: `npm install @anthropic-ai/sdk`. The hook needs `ANTHROPIC_API_KEY` in the shell that launched Claude Code; if it's missing the script exits silently (designed-in failure mode #1 below).

## The prompt-caching floor that nobody mentions

The `cache_control: { type: 'ephemeral' }` marker on the system block is the standard pattern for a frozen rubric — write once, read on every subsequent call at ~10% of input cost. Tutorials show the marker; the part they leave out is the **minimum-cacheable-prefix floor**, which is model-specific:

| Model                                                | Minimum |
|------------------------------------------------------|--------:|
| Opus 4.7, Opus 4.6, Opus 4.5, **Haiku 4.5**          | 4096 tokens |
| Sonnet 4.6                                           | 2048 tokens |
| Sonnet 4.5 and older                                 | 1024 tokens |

Anything shorter than the floor silently won't cache — no error, just `cache_creation_input_tokens: 0` and `cache_read_input_tokens: 0` on every response. The rubric in the script above is ~220 tokens. So on Haiku 4.5, **caching does not fire at this rubric size**. The marker is a forward-looking hint for the version of this rubric that has fifteen worked examples and crosses the floor.

This is worth knowing before installing the hook: if you read the response `usage` and see zeros across both cache fields, the marker isn't broken — the prefix is just under 4096 tokens. Either grow the rubric (worked examples are the natural way) or accept that every call is full-price input on Haiku 4.5.

## Receipts (TODO — will be filled in from a real session)

The hook is wired but not yet running for receipts. Notes will replace this section as the hook fires on real edits in a future session. Format the receipts will land in:

**Verdict log.** `receipts-drafts/judge-log.jsonl` accumulates one entry per qualified edit. Each entry has `ts`, `file`, `verdict`, `artifact_present`, `artifact_type`, `reason`, and the full `usage` payload (`input_tokens`, `output_tokens`, `cache_creation_input_tokens`, `cache_read_input_tokens`). After the first 5–10 verdicts, an excerpt of the JSONL goes here — including at least one `pass`, one `soft-fail`, and ideally one `block` from a deliberately-broken test file.

**The bill.** Per-edit cost on Haiku 4.5 at $1/M input + $5/M output. Back-of-envelope for a typical content edit: ~2K input tokens (rubric + file excerpt) plus ~80 output tokens of JSON ≈ $0.0024 per edit. The receipt belongs here once `usage.input_tokens` × pricing has been totalled across a real session, not estimated.

**Cache hit rate.** Will read 0/0 on Haiku 4.5 until the rubric grows past 4096 tokens — see the floor table above. Once the rubric crosses the threshold (probably after a few worked-example additions), `cache_read_input_tokens` should dominate `input_tokens` after the first call in each 5-minute window. Both numbers go here when they're real.

## Designed-in failure modes

The script hasn't been observed breaking yet, so this is "what was anticipated and what the script does about it" rather than "what broke" — the latter section will land here once the hook has run.

1. **No `ANTHROPIC_API_KEY` in the environment** → script exits 0 silently. The harness sees a no-op, the edit proceeds, no log entry. This is intentional — the hook is opt-in via the env var, so unset means off.
2. **Network failure / timeout / rate limit** → caught by the outer `try`, error written to stderr, exit 0. The edit is never blocked by infrastructure flakiness.
3. **Model returns non-JSON** (rubric drift, refusal, truncation at `max_tokens: 200`) → JSON parse fails, the raw text is logged under `parse_error`, exit 0. The edit proceeds; the operator sees the parse error in the log and tightens the rubric.
4. **False-positive block on a legitimate quote-block** (the rubric tries to be lenient about this, but it's the obvious failure pattern) → captured in the verdict log with `verdict: "block"`, surfaced via stderr to Claude, who can either rewrite the paragraph or — if the verdict is wrong — disable the hook for the next edit. The mitigation if false positives become routine: lower the temperature implicitly via the rubric (more explicit "block sparingly" language), or downgrade `block` to `soft-fail` and remove the `exit 2` path so verdicts log but never block.

## When NOT to use this hook

This is the most expensive hook in the cookbook by a wide margin, so think about the cost before installing.

- **One-off scripts and short sessions** — the deterministic guards earn their keep at zero marginal cost. The judge only earns its keep on long content sessions where a paraphrase-only paragraph would otherwise slip past until publish.
- **Refactor sessions that touch many content files in one go** — a 50-file refactor turns into 50 API calls. At ~$0.0024 per call that's ~$0.12, which is fine, but if the refactor is mechanical (frontmatter renames, link updates) the judge isn't grading anything substantive. Disable the hook by unsetting `ANTHROPIC_API_KEY` for the refactor session.
- **When the deterministic frontmatter guard is enough** — for structural failures (missing field, bad date) the guard catches it instantly and for free. The judge is for the failure mode the guard can't see: "the words are valid but they're paraphrase-only".

## What's next

Eight more hooks lined up. Each one ships only after a real run on real code — receipts coming for this one as soon as the bill is in. Subscribe to RSS for the rest.