How to Stop Claude Code Context Spirals: A Token Budget Playbook (2026)

If Claude Code is burning through tokens faster than expected, the task is almost never the culprit. The real problem loads before you type a single character: always-on MCP server schemas (5,000–15,000 tokens each), a sprawling global CLAUDE.md, and every installed skill description sitting in context from message one. The fix is a five-step playbook: audit your ~/.claude setup, measure what each item costs, prune the heavy always-on items, use progressive disclosure so large content loads on demand, and tier models so routine work runs on cheaper models. Do those in order and context spirals become a context budget you can actually reason about.

This is the practical walkthrough. Every step includes the commands, and every claim includes a real number. The throughline: you cannot manage what you have not measured, and the thing eating your window is almost never what you assumed.

Why do context spirals happen in the first place?

A "context spiral" is when the model runs low on usable window partway through a task and starts forgetting, repeating, or compacting badly. The instinct is to blame the task — "this codebase audit is too big" — but the task is usually a small fraction of the total spend. The bigger culprits are the always-on costs that load every session regardless of what you are doing:

MCP server tool schemas. A single rich MCP server can push 5,000–15,000 tokens of tool definitions into context every session, whether or not you call any of its tools. Stack three of them and 30k+ tokens are gone before you type.
Global CLAUDE.md. A well-meaning global instructions file can quietly grow to several thousand tokens. All of it loads every session.
Installed skill descriptions. Each installed skill contributes its name and a short description (roughly 15–40 tokens) so the model knows it exists. Trivial individually; across a hundred installed skills, a few thousand tokens of pure standing overhead.
Agent definitions. Registered agents add their own definitions on top.

None of this is your task. All of it is paid up front, every time. That is why the spiral feels like it comes from nowhere — the window was already half-spent before the work began.

What is the actual token cost of each component?

Understanding the cost distribution matters before you start pruning. We measured token costs across the full ClaudeKit v2 lineup using a tiktoken-compatible counter at pack time. The results by kit:

Kit	Commands	Skills	Agents	Measured Tokens
EngineerKit (`/eng`)	25	4	4	20,413
MarketingKit (`/mkt`)	20	3	2	16,714
SEOKit (`/seo`)	19	4	2	16,004
EcomKit (`/ecom`)	20	3	2	16,464
VideoKit (`/video`)	17	5	3	12,602
All 5 kits	101	19	13	82,197

That 82,197 total sounds large. But these are skills, meaning they load on demand — not every session. A skill's description line fires only when the model decides it is relevant to your request. The body tokens only load when the skill actually runs. Compare that to MCP schemas: those load unconditionally, every message, whether or not you use them.

The practical comparison is stark. A 16,000-token SEOKit that runs on demand is cheaper than a single moderately-complex MCP server that runs every session for a month. If you use Claude Code daily for 30 days, that idle MCP schema fires 30 times before you ever touch it. The skill fires zero times if you never ask for SEO work.

You can verify any kit's cost with:

ck tokens              # total installed cost
ck tokens --kit seokit # one kit, itemized
ck tokens --sort desc  # biggest first — your prune list

For items ck tokens does not cover (MCP schemas, raw CLAUDE.md files, arbitrary references), use any tiktoken-compatible counter or the rough 4 characters per token heuristic. Treat the heuristic as accurate to within 10% on English-and-code text — close enough for budgeting.

How do you audit what is actually loaded in your Claude setup?

Before changing anything, see what you have. The ~/.claude directory holds global config, skills, commands, and settings. Start by inventorying it:

# What is installed globally?
ls -la ~/.claude
ls ~/.claude/skills 2>/dev/null | wc -l        # skill count
ls ~/.claude/commands 2>/dev/null | wc -l      # command count
wc -l ~/.claude/CLAUDE.md 2>/dev/null          # instructions size
cat ~/.claude/settings.json | grep -i mcp      # which MCP servers are wired in?

You are looking for three numbers: how many skills are installed vs. how many you actually use; how large the global CLAUDE.md has grown; and which MCP servers load on startup. Write them down before you touch anything.

If you have ClaudeKit installed, the ck doctor command runs a diagnostic that flags common configuration problems:

ck doctor          # diagnoses config issues, stale installs, token anomalies
ck list            # shows all kits you are entitled to

The single most valuable question at this audit stage: which MCP servers are always on? Those are almost always your biggest line items, and the ones people most often forget they enabled six months ago.

What is the right order to prune context costs?

Prune biggest-always-on-item first. The payoff ordering:

Disable unused MCP servers. This is the single highest-value change for most setups. One fat schema can cost more than all your installed skill descriptions combined. If you are not actively using a server's tools in this project, turn it off. The settings path in Claude Code is ~/.claude/settings.json under the mcpServers key — set disabled: true or remove the entry.
Trim global CLAUDE.md. Move project-specific instructions into project-level config (a CLAUDE.md in the repo root). Keep the global file to genuinely universal rules. Every line here is paid every session regardless of the task.
Uninstall skills you do not use. Each removal saves the description-line cost for every future session. The body savings only apply if the skill was firing, but the description savings accumulate across hundreds of sessions.
Scope agents and definitions to where they are needed. Register globally only what you actually need globally. Project-local registration keeps that cost out of sessions where it is irrelevant.

A useful mental model: always-on costs are rent; on-demand costs are usage. Cut the rent first. A 1,200-token skill body that fires once a day costs less, in practice, than a 1,200-token MCP schema that loads every session automatically.

What is progressive disclosure and how does it reduce context usage?

Pruning reduces what loads always. Progressive disclosure reduces what loads when something does fire. A well-built skill keeps its core procedure lean (600–1,000 tokens) and pushes long reference material into files the skill reads only when a specific step is reached. This means a skill can be highly capable without being heavy by default.

When you build your own skills, structure them so the main procedure handles the common 80% path in under 1,000 tokens, and heavy edge-case content lives in a references/ file the skill reads only when that edge case comes up. The same principle applies to commands: a command like /eng debug should lead with the diagnostic loop (compact), then reach for extended reference material only if root cause is unclear.

The same idea governs how to think about agents. Heavy sub-work belongs in a specialist agent, which runs in its own isolated context and returns only the result. Your main window stays clean because the agent's internal reasoning never lands there. The ClaudeKit v2 agents are all read-only specialists — reviewers, auditors, and researchers that produce an evidence artifact — rather than blocking quality gates that sit in your primary context. We cover the architecture shift in the agents vs skills vs slash commands guide.

How does model tiering reduce the cost of long Claude Code sessions?

Model tiering routes cheap, mechanical work to a cheaper model tier and reserves the top model for reasoning that actually needs it. This does not reduce how much context loads, but it reduces cost per token spent — the other half of the budget.

In ClaudeKit v2, agents are assigned a model tier per role at design time. Read-only specialist agents (running an SEO citation audit, reviewing a diff for correctness) can run on a mid-tier model without quality loss. The primary workflow that interprets results and decides next steps runs on the top tier. The orchestration logic stays the same; the cost drops because the easy 80% is not paying top-tier rates.

You can apply the same discipline manually:

Exploratory, high-volume, low-stakes passes: smaller model
Hard synthesis, final review, ambiguous root-cause reasoning: top model

With Opus 4.8 (shipped May 28, 2026) at $5/$25 per million tokens with fast mode running roughly 3x cheaper, the math on tiering is more favorable than it has ever been. A long agentic session that might have cost $12 on one model costs $3–5 when the mechanical steps are properly tiered.

What does a full context-budget session look like end to end?

Put the five steps together and a single budget session runs like this:

# 1. Audit: inventory what is installed
ls ~/.claude/skills | wc -l
wc -l ~/.claude/CLAUDE.md
cat ~/.claude/settings.json | grep -c mcp
 
# 2. Measure: rank by cost, biggest first
ck tokens --sort desc | head -20
# separately count MCP schemas and CLAUDE.md with a tiktoken counter
 
# 3. Prune: disable idle MCP servers, trim CLAUDE.md, uninstall dead skills
 
# 4. Re-measure: confirm the drop
ck tokens
 
# 5. Tier: assign cheaper models to agents and commands you run at high volume

The first time most people reach step 2, the surprise is the same: an MCP server or a bloated global file they forgot about is the single largest line item, often exceeding all skill costs combined. Fix that and the spirals usually stop.

After pruning, the install of a full kit like EngineerKit (20,413 tokens across 25 commands and 4 agents) or SEOKit (16,004 tokens across 19 commands and 4 skills) represents a manageable, predictable standing cost — and because they install via ck install engineerkit with a token ledger printed on every run, you always know exactly what you are paying. You can also check the measuring context token costs post for deeper methodology on how these numbers are computed.

Numbered checklist: context budget in 10 minutes

If you have ten minutes and want to act now rather than read:

Run ls ~/.claude/skills | wc -l and cat ~/.claude/settings.json | grep -i mcp — write down the numbers.
Run ck tokens --sort desc | head -10 (or count with tiktoken if no kits are installed).
Identify any MCP servers you have not actively used in the past two weeks. Disable them.
Open ~/.claude/CLAUDE.md. Move anything project-specific to a per-project CLAUDE.md. Delete anything you added "just in case."
Uninstall skills tied to workflows you no longer run.
Re-run ck tokens and confirm the numbers dropped.
For high-volume commands or agents, check whether they can run on a cheaper model tier without quality loss.
Run ck doctor to catch any remaining configuration issues.
Open the next project-level CLAUDE.md and check it for bloat too — per-project cost counts for the duration of that session.
Schedule a monthly "context audit" to repeat steps 1–4 as your setup drifts.

FAQ

Why is Claude Code using so many tokens even on simple tasks?

Always-on context loads before your task starts: MCP server tool schemas (often 5,000–15,000 tokens each), a large global CLAUDE.md, and the description lines of every installed skill. These load every session regardless of the task, so the window is partly consumed before you type anything. Audit the always-on items first — that is where the biggest gains are.

How do I actually see what is eating my context window?

Start with ls ~/.claude to inventory your setup, then run ck tokens --sort desc to rank skill costs. For MCP schemas, your CLAUDE.md, and arbitrary files, count them with any tiktoken-compatible counter or estimate at roughly 4 characters per token. The goal is a ranked list so you cut the most expensive always-on items first.

Do MCP servers I am not using actually cost tokens?

Yes, and they are typically the biggest cost in a spiraling setup. A rich MCP server loads 5,000–15,000 tokens of tool schema into context every single session, whether or not you call any of its tools. Disabling idle MCP servers is almost always the single highest-payoff change you can make for context management.

What is the difference between skill tokens and always-on tokens?

Skill tokens are on-demand: the description fires when the model decides the skill is relevant, and the body fires only when the skill runs. Always-on tokens (MCP schemas, global CLAUDE.md) fire unconditionally, every message, every session. The same 1,200 tokens costs far more if it is always-on versus loaded once per relevant task — so the always-on items are the ones to cut first.

Does model tiering actually reduce context usage?

No — tiering reduces the cost per token, not the number of tokens. The context window loaded is the same regardless of which model tier you use. Tiering is the second budget lever, not the first: prune always-on context first, then tier the models for high-volume commands to reduce what you pay per token consumed.

How do ClaudeKit installs affect my token budget?

Every ck install <kit> prints a token ledger at completion so you know exactly what you are paying. Commands and skills load on demand, not always-on, so the cost materializes only when you run them. You can verify the current installed cost at any time with ck tokens, and ck doctor flags configuration problems. The total for all five kits is 82,197 measured tokens — but in daily use you pay a fraction of that because only the skills relevant to your current work load.

If you want the token budget solved rather than just managed, EngineerKit and SEOKit are the two kits most often cited for this problem: EngineerKit's daily-eight commands (/eng catchup, /eng plan, /eng tdd, /eng debug, /eng verify, /eng review, /eng commit, /eng handoff) are built to be context-efficient by design, and SEOKit's /seo quick-wins and /seo citations run as isolated evidence-producing agents that return a result without spilling their reasoning into your main window. Both install in under a minute — ck install engineerkit or ck install seokit — and the token ledger prints on first run so you can compare before and after. See the pricing page for current plans.