token-budget-advisor

A Claude Code skill from Affaan M's everything-claude-code repo that intercepts the response flow to offer the user a depth choice (25% / 50% / 75% / 100%) before answering, using heuristic input-token estimation and complexity-based multipliers to project the response window — heuristic-only, no real tokenizer.

Let the user pick response depth before Claude answers, instead of always defaulting to exhaustive

Source Affaan M

License MIT

First documented 2026-05-12

Receipts TODO

Cost Management

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

give me the short version first
respond at 50% depth
how many tokens will your answer use

What it does

token-budget-advisor (TBA) is the response-depth skill in Affaan M’s everything-claude-code — see skills/token-budget-advisor. It intercepts the response flow when the user explicitly wants to control answer size or depth: estimates input tokens with simple heuristics (words × 1.3 for prose, chars / 4 for code-heavy / mixed content), classifies the prompt complexity, applies a multiplier range to project the response window, then presents four depth levels (25% Essential / 50% Moderate / 75% Detailed / 100% Exhaustive) before answering.

The complexity ladder maps prompt type to multiplier range: Simple (3–8×) for yes/no or single-fact, Medium (8–20×) for “how does X work”, Medium-High (10–25×) for code with context, Complex (15–40×) for multi-part analysis or architecture, Creative (10–30×) for narrative. Response window = input tokens × multiplier range, capped at the model’s output limit. Each level pulls from the range — 25% is min + (max - min) × 0.25, 100% is the max.

Triggers are explicit (the user mentions tokens, budget, depth, response length, “tldr”, “short version”, “respuesta corta”) and so are non-triggers (auth/session/payment tokens, trivially-one-line answers, follow-ups where the user already picked a level this session — maintain silently). Shortcut phrases skip the depth prompt entirely: “1” / “tldr” / “brief” = 25%, “exhaustive” / “full deep dive” = 100%. The skill is honest about its limits: heuristic-only, ~85–90% accuracy, ±15% variance — and requires that disclaimer to render alongside every depth menu.

When to use it

User wants depth control before the answer lands, not after
Long-context conversations where defaulting to exhaustive burns budget on questions that wanted a tldr
Sessions where the operator wants explicit response-size discipline (cost-sensitive work, batch processing)
Multi-language conversations — the skill triggers on Spanish (“dame la versión corta”) and English equally

When not to reach for it:

Trivial one-line factual answers — overhead exceeds the value
The user already chose a level earlier in the session — maintain silently, don’t re-prompt
Cost / spend reporting after the fact — that’s cost-tracking
Mid-task implementation where depth menus break the flow

Install

From affaan-m/everything-claude-code at skills/token-budget-advisor/. Drop the folder into ~/.claude/skills/token-budget-advisor/. The skill is heuristic-only and ships without a tokenizer dependency — the upstream project has a Python estimator script available separately, but this skill is intentionally self-contained.

What a session looks like

User invokes the skill. “Give me the short version of why monorepos are a fit for our team, then I’ll ask for more.”
Skill estimates input tokens. Mixed content (some prose, some technical claims) — uses words × 1.3 since the prompt is mostly prose.
Skill classifies complexity. “Why is X a fit for us” is Medium-High (subjective + needs context).
Compute response window. Input × 10 to 25× multiplier range, capped at output-token limit. Result: ~250–625 tokens for the response window.
Present the depth menu. The four levels with their target token counts and short-form descriptions, plus the heuristic-accuracy disclaimer.
User picks “1” or “25%” or “short version.” Skill answers at 25% Essential — 2–4 sentences, direct answer + key conclusion, no preamble, no alternatives.
Maintain silently for follow-ups. If the user asks a follow-up without specifying a level, the skill keeps 25% until they switch.

The discipline that makes it work: the depth prompt is presented before the answer, not after. A “give me a shorter version” follow-up is too late — the model has already committed to the long answer. The skill’s wedge is moving the decision to before the generation step.

Receipts

TODO — to be filled in from a real session. Once the skill has been used end-to-end, this section will capture: how close the heuristic input-token estimate was to a real tokenizer count on a sample prompt (the skill claims ~85–90% accuracy with ±15% variance — the receipts should test that), whether the complexity classification landed correctly on a borderline prompt, the actual generated-response token count vs. the projected window at the chosen depth level, and whether the “maintain silently” rule caused user friction for a session that needed a depth change mid-flow.

Source and attribution

From Affaan M’s everything-claude-code — an MIT-licensed skill collection covering harness construction, agent ops, video, payments, and platform-specific patterns.

License: MIT.

Quoting the accuracy disclaimer verbatim: “This skill uses heuristic estimation — no real tokenizer. Accuracy ~85-90%, variance ±15%. Always show the disclaimer.” The honesty is what makes the skill safe to use in cost-sensitive workflows — the user knows the numbers are projections, not measurements.