llm-trading-agent-security

A Claude Code skill from Affaan M's everything-claude-code repo for autonomous trading agent defense — layered controls treating prompt hygiene, hard spend limits, pre-send simulation, circuit breakers, MEV protection, and wallet isolation as independent layers. No single check is enough when an injection turns directly into asset loss.

Layer prompt-injection, spend-limit, simulation, and wallet-isolation defenses on an agent that signs transactions

Source Affaan M

License MIT

First documented 2026-05-12

Receipts TODO

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

harden my trading agent against prompt injection
spend limits for an autonomous LLM trader
pre-send simulation for agent transactions

What it does

llm-trading-agent-security is the autonomous-trading-agent defense skill in Affaan M’s everything-claude-code — see skills/llm-trading-agent-security. It treats the threat model as harsher than normal LLM apps: an injection or bad tool path turns directly into asset loss, so no single check is sufficient. The skill is review / hardening focused, not exploit construction.

The defense layers are independent by design: prompt hygiene (sanitize external data for injection patterns like ignore previous instructions, send .* to 0x..., approve .* for before it enters the execution-capable context), hard spend limits (per-tx + 24-hour ceiling enforced outside the model’s output — the model can’t talk its way past a SpendLimitGuard.check_and_record), pre-send simulation (w3.eth.call(tx) returns the simulated outcome; mandatory min_amount_out; reject anything below threshold), circuit breakers (halt on consecutive losses or hourly drawdown beyond a threshold), wallet isolation (dedicated hot wallet, session funds only, never the primary treasury), MEV and deadline protection (private RPC or Flashbots, per-strategy slippage bps, explicit deadlines).

Every layer is shown with Python code: regex sanitizer, SpendLimitGuard class with explicit SpendLimitError, safe_execute that requires expected_min_out and refuses without it, TradingCircuitBreaker with MAX_CONSECUTIVE_LOSSES and MAX_HOURLY_LOSS_PCT thresholds, wallet loaded from env var (never code or logs), MEV protection via PRIVATE_RPC = "https://rpc.flashbots.net". The pre-deploy checklist is the final gate: every layer present and tested, no fallback to unmetered access when any layer is unreachable.

When to use it

Building an AI agent that signs and sends transactions — the threat model is harsher than non-financial LLM apps
Auditing an existing trading bot or on-chain execution assistant
Designing wallet key management for an agent — env vars, dedicated hot wallet, never the treasury
Giving an LLM access to order placement, swaps, or treasury operations
Hardening against prompt injection from external data sources (news, social, webhook payloads) that the agent reads

When not to reach for it:

Solidity contract review — that’s defi-amm-security
Non-trading agent payments — that’s agent-payment-x402
Pure model-output safety filtering — the skill is execution-layer defense
Exploit construction or offensive security — defensive review only

Install

From affaan-m/everything-claude-code at skills/llm-trading-agent-security/. Drop the folder into ~/.claude/skills/llm-trading-agent-security/. The skill is patterns + Python code; runtime dependencies vary by which layer the operator implements — web3.py for the simulation and chain reads, eth_account for wallet loading, the operator’s preferred LLM SDK for the model integration. Private mempool / Flashbots access requires a separate RPC endpoint.

What a session looks like

Audit the data path. Every external string that flows into the model prompt — token names, pair labels, webhook payloads, social-media inputs — gets a sanitize_onchain_data pass. The regex catches ignore .* instructions, send .* to 0x[0-9a-fA-F]{40}, transfer .* to, approve .* for. Any match raises before the prompt is built.
Wire the spend limit guard. SpendLimitGuard.check_and_record(usd_amount) runs before any signed transaction. Per-tx ceiling, 24-hour rolling window. The guard is enforced outside the model’s output — model can’t override.
Pre-send simulation. safe_execute is mandatory; calls self.w3.eth.call(tx) first. expected_min_out is required — refuse without it. If the simulated output is below threshold, SlippageError, no send.
Circuit breaker. TradingCircuitBreaker.check(portfolio_value) halts on consecutive losses ≥ 3 or hourly PnL below -5%. Invalid hour-start values also halt. Halt is sticky — manual reset required.
Wallet isolation. Private key from env var (TRADING_WALLET_PRIVATE_KEY), missing key fails immediately, dedicated hot wallet, session funds only, never the primary treasury.
MEV / deadline. Private RPC or Flashbots; per-strategy slippage bps (stable: 10, volatile: 50); explicit deadline 60 seconds out.
Audit log every decision. Not just successful sends — every rejected action, every halt, every sanitization match. Recovery later requires the audit trail.

The discipline that makes it work: layered independence. The skill is explicit that no single check is enough. If the model is compromised, prompt hygiene fails — but spend limit still holds. If spend limit is bypassed somehow, simulation rejects. Each layer is enforced outside model output so the agent can’t talk its way past the next.

Receipts

TODO — to be filled in from a real session. Once the layered defenses have been applied to a real trading agent, this section will capture: how many injection-pattern matches the regex actually caught in a week of live external-data ingestion, whether the pre-send simulation rejected at least one transaction that would’ve gone through under spot pricing, the circuit-breaker thresholds that turned out to be too tight or too loose for the strategy’s normal volatility, and whether wallet isolation forced an architectural change (separate keystore, separate signing path) or just stayed an env-var swap.

Source and attribution

From Affaan M’s everything-claude-code — an MIT-licensed skill collection covering harness construction, agent ops, video, payments, and platform-specific patterns.

License: MIT.

Quoting the layered-defense rule verbatim: “Layer the defenses. No single check is enough.” That’s the wedge — single-layer defenses against a model with transaction-signing authority all reduce to “the model didn’t decide to do the bad thing today”; the skill enforces independent checks so a single failure doesn’t cascade to asset loss.