# ElevenLabs Conversational AI: four 200 OK gotchas

> Four ElevenLabs Conversational AI gotchas that return 200 OK with wrong behavior: workspace-scoped keys, hidden override toggles, dual webhook auth, test gates.

**Canonical URL**: https://agentcookbooks.com/blog/elevenlabs-conversational-ai-four-200-ok-footguns/

**Published**: 2026-05-26

**Tags**: claude-code, deployment

---

Three days deploying an outbound-calling voice agent on Twilio + ElevenLabs Conversational AI. Four gotchas, and every one of them lands in the worst debugging category: the API returns 200 OK with broken behavior. No HTTP error means no log line to grep. The cause sits behind a dashboard toggle, a workspace selector, or a field that looks production-required but isn't. None show up in the docs.

## What I ran

Not a skill activation. Production deploy of an outbound-calling voice-agent stack. Each gotcha cost 30–90 minutes of "this should work" before the cause clicked. The pattern across all four: the failure mode is silent-ish — the call connects, the API returns a `conversation_id`, the request looks routed — but the agent never speaks, or speaks wrong, or fires the wrong webhook auth. Three days of deploy time, four distinct walls.

## What happened

**1. Workspaces are isolated. API keys work only inside their creation workspace.**

A `404 document_not_found` on a resource ID that *looks* correct — `agent_id`, `phone_number_id`, `conversation_id` — is most likely a workspace-scope mismatch, not a typo. The dashboard has a workspace selector in the top-left sidebar; the API key must be created inside the same workspace as the resource it targets. Applies across post-call webhooks, phone numbers, agents — every workspace-tier resource.

Symptom-wise, an outbound-call POST returns a right-shaped error against a right-looking ID. When that happens, switch workspace context first — don't re-fetch, don't suspect typos.

**2. Client-side overrides must be enabled per-field in the dashboard before the API will honor them.**

The body shape that looks right:

```json
{
  "conversation_initiation_client_data": {
    "conversation_config_override": {
      "agent": { "first_message": "Hi, this is..." }
    }
  }
}
```

Silently fails if the matching `First message` toggle is OFF under **Agent → Settings → Security → Overrides**. The API returns 200 with a `conversation_id`. Twilio rings the caller. They answer. The agent never speaks. Line drops after the awkward silence. No error in your logs because the API call itself succeeded — the override was just dropped server-side.

Same applies to every field under `conversation_config_override` — LLM, Voice, System prompt, Tools, Knowledge base. Each field has its own toggle. Toggles default to OFF on new agents. Before passing any override at call-time, flip the matching toggle in the Overrides panel. Bake this into the agent-provisioning checklist.

**3. Two completely different webhook auth flows. Don't confuse them.**

- **Server tools** (tool calls the agent makes *during* a conversation): static secret in a configurable header. Bearer-style. No body hashing.
- **Post-call webhooks** (workspace-level, fire when a call ends with the full transcript): `HMAC-SHA256(timestamp + "." + body)` in the `ElevenLabs-Signature` header.

Implementing HMAC where bearer was expected — or vice versa — returns auth errors that look like network or routing problems. Verify the *specific* flow's mechanism before writing the verifier function.

**4. The dashboard requires a Test value for every `{{placeholder}}` in the system prompt — but that's a dev-UI gate, not a production requirement.**

When ElevenLabs auto-parses the system prompt and detects `{{variables}}`, the right panel lists them with "Test value is required." These are fallbacks used *only* when you test the agent in the dashboard chat UI. In production (real phone calls), the initiation webhook always provides the values. The test defaults never fire.

Dummy empty-state strings (`"No active tasks."`, `"Source unavailable."`) are correct test values — they mirror what the production webhook would fall back to on source failure, so dev behavior matches a degraded-prod call. Easy to misread the field as "my agent won't ship without these populated"; that misreading wastes an hour of fishing for real values.

## Where it drifted

Three different webhook configuration locations live in the dashboard, easy to look in the wrong one for an hour: tool webhooks in the agent's **Tools** tab, conversation-initiation webhooks in the agent's **Security** tab, post-call webhooks at the workspace level under **Settings**. Three flows, three pages, no cross-linking between them.

The webhook-tool UI forces at least one body property even when the handler takes no input. Workaround: add a dummy `context` string property with `Required=OFF`, Value Type `LLM Prompt`, description noting it's unused. The webhook ignores unknown keys, so it's harmless — but the UI won't let you save without it.

Initiation-webhook response shape: `{"dynamic_variables": {"key": "value", ...}}`. All values must be strings. ElevenLabs does placeholder substitution `{{key}}` in the system prompt, string-only. Pass a number or a boolean and the substitution either fails or stringifies in a way that downstream prompt logic doesn't expect.

The 200-OK-wrong-behavior pattern is what makes all four gotchas slow to diagnose. A 4xx response would point at the request; a 5xx would point at the server. A 200 OK with the wrong outcome points nowhere — the caller has to walk back from the symptom (silence on the line, missing webhook fire, wrong auth header) through the entire request path.

## What I'd change

Build a one-line provisioning checklist that runs after every new agent is created: flip the override toggles you intend to use, set test values for every `{{placeholder}}` (dummies are fine), confirm webhook URLs land in the right tab. The dashboard defaults are off for the overrides and require manual entry for the placeholders — neither is something the API can fix from your side.

When a 404 hits an ID that checks out, switch workspace context first. Don't re-fetch, don't suspect typos. The selector is the first move, not the third.

Document the webhook-flow choice (bearer vs HMAC) at the verifier function itself, not in a README. The verifier is what gets read when something breaks; the README is what gets read when you're onboarding. They serve different audiences and the verifier wins on the failure path.

4 gotchas, ~3 days of deploy, 30–90 min per gotcha after the symptom surfaced. Zero of the four caused HTTP error responses. All were 200 OK with the wrong outcome.