# ASR + an LLM corrupted my agent's memory twice in 6 weeks

> Each layer was fine alone. Stacked, ASR and an LLM classifier wrote a grammatically valid false fact the agent stated on every call. The audit that catches it.

**Canonical URL**: https://agentcookbooks.com/blog/asr-llm-double-failure-agent-memory/

**Published**: 2026-05-29

**Tags**: claude-code, agents

---

A voice agent that builds persistent memory from call transcripts stated a confident, wrong fact about the person it serves — twice in six weeks. Each layer of the pipeline was individually fine. Stacked, they manufactured grammatically valid, semantically false facts that the agent then repeated on every subsequent call. Here are both failure modes, why grep can't catch them, and the audit that's the only thing that does.

## The pipeline, and where it fails

```
caller speech → ASR (Whisper) → LLM classifier → persistent store → agent prompt
                     ^                  ^
                     |                  +-- completes the sentence "grammatically"
                     |                      while inverting subject and object
                     +-- mistranscribes a proper noun (Brand-A → "Vrand-A")
```

ASR and the classifier each do a reasonable job in isolation. The corruption is *emergent*: a small transcription error feeds a classifier that confidently "cleans it up" into fluent, wrong prose, which persists and gets injected into the agent's context as fact.

## Mode 1: an ASR proper-noun error becomes the baseline

The caller said "Brand-A." Whisper heard "Vrand-A." The classifier (a small, cheap model) tagged the fact `scope="always_on"` — so it persisted to the vector store **and** synced into the agent's system prompt as baseline context. Every call after that opened with the agent asserting "you run three brands: Vrand-A, Brand-B, Brand-C" — wrong, confident, repeated on every call, with no automated path back.

## Mode 2: an ASR error and a grammar inversion, stacked

A caller drew a distinction between their day job and a side project, naming the side project's company in the same sentence. Two bugs stacked silently:

1. ASR mistranscribed the company name into a similar-sounding one that doesn't exist.
2. The classifier merged the two clauses into one sentence and **inverted subject and object** — making the day-job company the side project (or the reverse).

This fact was scoped `query`, not `always_on`, so it spared the baseline. But vector search would still surface it on any retrieval about the day job or the side project, misrepresenting the person's identity in a way that's much harder to predict than a baseline error.

## Why grep won't save you

Mode 2's output is a grammatically valid sentence. No syntax error, no malformed record, no schema violation — nothing a keyword scan or a validator flags. The only thing that catches it is a reader who already knows the truth doing a sentence-level sanity check. That makes detection operator-effort, not automation — at least until you store ground-truth identity facts separately and diff extracted facts against them, which isn't there by default.

## The detection + correction recipe

Weekly, dump every fact scoped `always_on` or `query` and actually read them — both scopes, because a `query` fact won't poison the baseline but *will* get retrieved on search and shape responses:

```
dump_voice_memory.py                 # list always_on + query facts, with IDs
search_voice_transcripts.py "Vrand"  # did the caller ever actually say it? confirms ASR is the source
fix_voice_facts_<date>.py            # one-off: delete/update the bad fact via the memory API
sync_always_on_to_agent()            # re-sync the baseline — vendor-specific, and easy to forget
dump_voice_memory.py                 # re-read to confirm
# then a live test call: "what do you know about me?"
```

Two traps live in the correction itself:

- **The store and the prompt are two different surfaces.** Updating the vector store is easy; the agent's baseline system prompt is injected separately and usually needs its own sync call. Fix the database, forget the sync, and the store tells the truth while the agent keeps stating the lie.
- **Console edits don't re-trigger the sync.** Editing a fact in the database UI fixes the database but leaves the agent prompt stale until the next `always_on` write fires the sync hook. Always correct through the code path that re-syncs, not the dashboard.

## Where this generalizes

The shape is `noisy input → LLM classifier → persistent store → prompt context`. Voice is just one instance. OCR'd documents, screen-scraped text, low-quality pasted content — anything where a lossy front-end feeds an LLM that "tidies" the noise into confident prose, which then becomes load-bearing memory, carries the same double-failure risk.

Two corruption incidents in six weeks of production. The caller complained on neither, because the agent sounded completely sure of itself — time-to-detect without the audit was effectively infinite. Each took about 30 minutes to correct once found. The weekly read-the-facts pass is cheap; the thing it's guarding against is an agent that confidently misremembers who you are.