Skip to main content

ElevenLabs Conversational AI: four 200 OK gotchas

Illustrated receipt card summarizing: ElevenLabs Conversational AI: four 200 OK gotchas

Three days deploying an outbound-calling voice agent on Twilio + ElevenLabs Conversational AI. Four gotchas, and every one of them lands in the worst debugging category: the API returns 200 OK with broken behavior. No HTTP error means no log line to grep. The cause sits behind a dashboard toggle, a workspace selector, or a field that looks production-required but isn’t. None show up in the docs.

What I ran

Not a skill activation. Production deploy of an outbound-calling voice-agent stack. Each gotcha cost 30–90 minutes of “this should work” before the cause clicked. The pattern across all four: the failure mode is silent-ish — the call connects, the API returns a conversation_id, the request looks routed — but the agent never speaks, or speaks wrong, or fires the wrong webhook auth. Three days of deploy time, four distinct walls.

What happened

1. Workspaces are isolated. API keys work only inside their creation workspace.

A 404 document_not_found on a resource ID that looks correct — agent_id, phone_number_id, conversation_id — is most likely a workspace-scope mismatch, not a typo. The dashboard has a workspace selector in the top-left sidebar; the API key must be created inside the same workspace as the resource it targets. Applies across post-call webhooks, phone numbers, agents — every workspace-tier resource.

Symptom-wise, an outbound-call POST returns a right-shaped error against a right-looking ID. When that happens, switch workspace context first — don’t re-fetch, don’t suspect typos.

2. Client-side overrides must be enabled per-field in the dashboard before the API will honor them.

The body shape that looks right:

{
  "conversation_initiation_client_data": {
    "conversation_config_override": {
      "agent": { "first_message": "Hi, this is..." }
    }
  }
}

Silently fails if the matching First message toggle is OFF under Agent → Settings → Security → Overrides. The API returns 200 with a conversation_id. Twilio rings the caller. They answer. The agent never speaks. Line drops after the awkward silence. No error in your logs because the API call itself succeeded — the override was just dropped server-side.

Same applies to every field under conversation_config_override — LLM, Voice, System prompt, Tools, Knowledge base. Each field has its own toggle. Toggles default to OFF on new agents. Before passing any override at call-time, flip the matching toggle in the Overrides panel. Bake this into the agent-provisioning checklist.

3. Two completely different webhook auth flows. Don’t confuse them.

  • Server tools (tool calls the agent makes during a conversation): static secret in a configurable header. Bearer-style. No body hashing.
  • Post-call webhooks (workspace-level, fire when a call ends with the full transcript): HMAC-SHA256(timestamp + "." + body) in the ElevenLabs-Signature header.

Implementing HMAC where bearer was expected — or vice versa — returns auth errors that look like network or routing problems. Verify the specific flow’s mechanism before writing the verifier function.

4. The dashboard requires a Test value for every {{placeholder}} in the system prompt — but that’s a dev-UI gate, not a production requirement.

When ElevenLabs auto-parses the system prompt and detects {{variables}}, the right panel lists them with “Test value is required.” These are fallbacks used only when you test the agent in the dashboard chat UI. In production (real phone calls), the initiation webhook always provides the values. The test defaults never fire.

Dummy empty-state strings ("No active tasks.", "Source unavailable.") are correct test values — they mirror what the production webhook would fall back to on source failure, so dev behavior matches a degraded-prod call. Easy to misread the field as “my agent won’t ship without these populated”; that misreading wastes an hour of fishing for real values.

Where it drifted

Three different webhook configuration locations live in the dashboard, easy to look in the wrong one for an hour: tool webhooks in the agent’s Tools tab, conversation-initiation webhooks in the agent’s Security tab, post-call webhooks at the workspace level under Settings. Three flows, three pages, no cross-linking between them.

The webhook-tool UI forces at least one body property even when the handler takes no input. Workaround: add a dummy context string property with Required=OFF, Value Type LLM Prompt, description noting it’s unused. The webhook ignores unknown keys, so it’s harmless — but the UI won’t let you save without it.

Initiation-webhook response shape: {"dynamic_variables": {"key": "value", ...}}. All values must be strings. ElevenLabs does placeholder substitution {{key}} in the system prompt, string-only. Pass a number or a boolean and the substitution either fails or stringifies in a way that downstream prompt logic doesn’t expect.

The 200-OK-wrong-behavior pattern is what makes all four gotchas slow to diagnose. A 4xx response would point at the request; a 5xx would point at the server. A 200 OK with the wrong outcome points nowhere — the caller has to walk back from the symptom (silence on the line, missing webhook fire, wrong auth header) through the entire request path.

What I’d change

Build a one-line provisioning checklist that runs after every new agent is created: flip the override toggles you intend to use, set test values for every {{placeholder}} (dummies are fine), confirm webhook URLs land in the right tab. The dashboard defaults are off for the overrides and require manual entry for the placeholders — neither is something the API can fix from your side.

When a 404 hits an ID that checks out, switch workspace context first. Don’t re-fetch, don’t suspect typos. The selector is the first move, not the third.

Document the webhook-flow choice (bearer vs HMAC) at the verifier function itself, not in a README. The verifier is what gets read when something breaks; the README is what gets read when you’re onboarding. They serve different audiences and the verifier wins on the failure path.

4 gotchas, ~3 days of deploy, 30–90 min per gotcha after the symptom surfaced. Zero of the four caused HTTP error responses. All were 200 OK with the wrong outcome.