# token-budget-advisor

> A Claude Code skill from Affaan M's everything-claude-code repo that intercepts the response flow to offer the user a depth choice (25% / 50% / 75% / 100%) before answering, using heuristic input-token estimation and complexity-based multipliers to project the response window — heuristic-only, no real tokenizer.

**Use case**: Let the user pick response depth before Claude answers, instead of always defaulting to exhaustive

**Canonical URL**: https://agentcookbooks.com/skills/token-budget-advisor/

**Topics**: claude-code, skills, cost-management

**Trigger phrases**: "give me the short version first", "respond at 50% depth", "how many tokens will your answer use"

**Source**: [Affaan M](https://github.com/affaan-m/everything-claude-code/tree/main/skills/token-budget-advisor)

**License**: MIT

---

## What it does

`token-budget-advisor` (TBA) is the response-depth skill in [Affaan M's everything-claude-code](https://github.com/affaan-m/everything-claude-code) — see [skills/token-budget-advisor](https://github.com/affaan-m/everything-claude-code/tree/main/skills/token-budget-advisor). It intercepts the response flow when the user explicitly wants to control answer size or depth: estimates input tokens with simple heuristics (`words × 1.3` for prose, `chars / 4` for code-heavy / mixed content), classifies the prompt complexity, applies a multiplier range to project the response window, then presents four depth levels (25% Essential / 50% Moderate / 75% Detailed / 100% Exhaustive) before answering.

The complexity ladder maps prompt type to multiplier range: Simple (3–8×) for yes/no or single-fact, Medium (8–20×) for "how does X work", Medium-High (10–25×) for code with context, Complex (15–40×) for multi-part analysis or architecture, Creative (10–30×) for narrative. Response window = input tokens × multiplier range, capped at the model's output limit. Each level pulls from the range — 25% is `min + (max - min) × 0.25`, 100% is the max.

Triggers are explicit (the user mentions tokens, budget, depth, response length, "tldr", "short version", "respuesta corta") and so are non-triggers (auth/session/payment tokens, trivially-one-line answers, follow-ups where the user already picked a level this session — maintain silently). Shortcut phrases skip the depth prompt entirely: "1" / "tldr" / "brief" = 25%, "exhaustive" / "full deep dive" = 100%. The skill is honest about its limits: heuristic-only, ~85–90% accuracy, ±15% variance — and requires that disclaimer to render alongside every depth menu.

## When to use it

- User wants depth control before the answer lands, not after
- Long-context conversations where defaulting to exhaustive burns budget on questions that wanted a tldr
- Sessions where the operator wants explicit response-size discipline (cost-sensitive work, batch processing)
- Multi-language conversations — the skill triggers on Spanish ("dame la versión corta") and English equally

When *not* to reach for it:

- Trivial one-line factual answers — overhead exceeds the value
- The user already chose a level earlier in the session — maintain silently, don't re-prompt
- Cost / spend reporting after the fact — that's `cost-tracking`
- Mid-task implementation where depth menus break the flow

## Install

From [affaan-m/everything-claude-code](https://github.com/affaan-m/everything-claude-code) at `skills/token-budget-advisor/`. Drop the folder into `~/.claude/skills/token-budget-advisor/`. The skill is heuristic-only and ships without a tokenizer dependency — the upstream project has a Python estimator script available separately, but this skill is intentionally self-contained.

## What a session looks like

1. **User invokes the skill.** "Give me the short version of why monorepos are a fit for our team, then I'll ask for more."
2. **Skill estimates input tokens.** Mixed content (some prose, some technical claims) — uses `words × 1.3` since the prompt is mostly prose.
3. **Skill classifies complexity.** "Why is X a fit for us" is Medium-High (subjective + needs context).
4. **Compute response window.** Input × 10 to 25× multiplier range, capped at output-token limit. Result: ~250–625 tokens for the response window.
5. **Present the depth menu.** The four levels with their target token counts and short-form descriptions, plus the heuristic-accuracy disclaimer.
6. **User picks "1" or "25%" or "short version."** Skill answers at 25% Essential — 2–4 sentences, direct answer + key conclusion, no preamble, no alternatives.
7. **Maintain silently for follow-ups.** If the user asks a follow-up without specifying a level, the skill keeps 25% until they switch.

The discipline that makes it work: the depth prompt is presented *before* the answer, not after. A "give me a shorter version" follow-up is too late — the model has already committed to the long answer. The skill's wedge is moving the decision to before the generation step.

## Receipts

_TODO — to be filled in from a real session. Once the skill has been used end-to-end, this section will capture: how close the heuristic input-token estimate was to a real tokenizer count on a sample prompt (the skill claims ~85–90% accuracy with ±15% variance — the receipts should test that), whether the complexity classification landed correctly on a borderline prompt, the actual generated-response token count vs. the projected window at the chosen depth level, and whether the "maintain silently" rule caused user friction for a session that needed a depth change mid-flow._

## Source and attribution

From [Affaan M's everything-claude-code](https://github.com/affaan-m/everything-claude-code/tree/main/skills/token-budget-advisor) — an MIT-licensed skill collection covering harness construction, agent ops, video, payments, and platform-specific patterns.

License: MIT.

Quoting the accuracy disclaimer verbatim: *"This skill uses heuristic estimation — no real tokenizer. Accuracy ~85-90%, variance ±15%. Always show the disclaimer."* The honesty is what makes the skill safe to use in cost-sensitive workflows — the user knows the numbers are projections, not measurements.