The Real Cost of Free Skills: Token Waste in Popular Repos, Measured

Free Claude Code skills are not free in context. We estimated the token footprint of publicly available SKILL.md files using a tiktoken-style chars/4 counter and found a consistent pattern: always-loaded bloat and missing progressive disclosure. A single poorly designed skill can add 3,600 tokens to every session — that compounds to roughly 360,000 wasted tokens a month at 100 sessions, before you do anything useful. This post documents the methodology, the findings pattern, the waste math, and how ClaudeKit's measured token ledger compares to the typical free alternative.

What Is the Real Cost of a "Free" Skill?

Free, in the context of Claude Code skills, means zero dollars. It does not mean zero tokens. Every file that sits in your ~/.claude folder and loads on session start contributes to your context window before your first prompt. For a casual user running a few sessions a week, a few extra kilobytes barely registers. For an active developer running 80 to 120 sessions a month, the math becomes material fast.

The Agent Skills open standard, adopted by 32+ tools since its December 18, 2025 launch, catalyzed a massive proliferation of free skills. There are now roughly 90,000 skills on skills.sh, and the ecosystem grew 18.5x in twenty days after that launch. Volume is not the same as quality, and quality in this context has a very specific meaning: does the skill load its heavy content only when needed, or does it dump everything into context every session?

Most free skills do the latter. We measured to confirm it.

How Did We Measure the Token Footprint?

The goal is directional honesty, not a forensic audit. Here is exactly what we did and what the limits are.

Source. Public SKILL.md and agent-definition files from well-known repositories, read as published at the file level.

Counter. Character count divided by four. This is the standard tiktoken-style approximation for English prose — roughly four characters per token. Code, tables, and punctuation tokenize differently, so every number below is an estimate, not an exact figure to bill against.

What we measured. The body text a skill or agent contributes when loaded. We did not instrument live sessions or hook into a running model. We read the files and counted.

Honest caveats up front:

chars/4 is an approximation. Real tokenization depends on the exact tokenizer and content type.
Loading behavior varies. Whether a file sits in context always or only when triggered depends on how it is structured and invoked.
These are representative figures, not a leaderboard. We are illustrating a pattern, not ranking projects to the token.

You can reproduce this approach on your own setup with the audit script described in our ~/.claude folder context audit.

What Did We Actually Find?

Two failure modes recur across the free files we examined. They are not rare edge cases — they are the default.

Failure Mode 1: Always-Loaded Bloat

Some skills put their entire reference body into a surface that loads regardless of whether the skill is used in a given session. A PDF-manipulation skill from a prominent public repository came in around 3,600 estimated tokens of reference detail. That is not a problem if it loads only when you manipulate a PDF. It is waste if it sits resident in every session whether or not a PDF is anywhere near the conversation.

The bloat is not the size. It is size combined with always-on loading.

Failure Mode 2: Missing Progressive Disclosure

Progressive disclosure is the design pattern that makes large skill libraries affordable: keep the always-available surface — the short description — tiny, and push detail into a body that loads only on trigger. Many free skills skip this entirely. No lightweight always-on layer. No on-demand heavy layer. One undifferentiated block. When that block is large and always resident, you pay for detail you are not using in that session.

Representative Observations

Source type	Item	Estimated tokens (chars/4)	Load behavior observed
Major public skills repo	PDF-handling skill	~3,600	Always loaded
Typical marketing skills repo	one-shot copy skill	~700	Always loaded
Large agent repo	single agent definition	~2,000	Always loaded
ClaudeKit v2 MarketingKit	`/mkt voice` command	included in 16,714 kit total	Trigger-loaded detail
ClaudeKit v2 SEOKit	`/seo quick-wins` command	included in 16,004 kit total	Trigger-loaded detail

The marketing skill at 700 tokens is not a crisis on its own. The problem is aggregation: install ten of them plus a few agent definitions around 2,000 each, and you are carrying several thousand always-resident tokens before you do anything. The PDF skill alone at 3,600 tokens is already more than the entire always-resident surface of a well-built kit.

What Does the Waste Actually Cost Over Time?

Here is why a few kilobytes matters. Suppose your setup carries an extra 3,000 estimated tokens of always-loaded skill and agent text that you do not use in a given session. That is a conservative figure if you have a PDF skill at approximately 3,600, a couple of agent definitions at around 2,000 each, and a handful of 700-token marketing skills.

Always-loaded waste per session:        ~3,000 tokens (estimate)
Sessions per month (active user):       ~100
Wasted context per month:               ~300,000 tokens

Over one quarter:                       ~900,000 tokens

Three hundred thousand tokens a month of context you are paying to carry and never using. The exact dollar cost depends on your model and plan. At Sonnet 4.6 input pricing, the numbers are small per session but the structural point is plan-independent: always-loaded waste compounds linearly with session count, every month, indefinitely.

This is a projection from estimated figures, not a measured bill. The inputs are approximations and your real always-loaded set will differ. We are showing the shape of the cost.

A useful cross-check: ClaudeKit's five v2 kits total 82,197 measured tokens across 101 commands, 19 skills, and 13 read-only agents. The individual kit totals are 20,413 for EngineerKit, 16,714 for MarketingKit, 16,004 for SEOKit, 16,464 for EcomKit, and 12,602 for VideoKit. Those are full-kit figures — the always-resident portion is a fraction of that, because the detail loads on command trigger. The contrast with "install a PDF skill and it dumps 3,600 tokens into every session" is the design difference.

Is This a Size Problem or a Design Problem?

It is tempting to read the numbers as "big skills bad, small skills good." That is the wrong lesson.

A 3,600-token skill that genuinely needs that detail and loads only when triggered is well-designed. A 700-token skill that sits resident every session and is triggered twice a month is poorly designed despite being small. The variable that matters is not size — it is size times load frequency for content you are not using.

This reframes the fix entirely. You do not want skill creators to write smaller skills at the cost of capability. A PDF skill that omits half its reference material to save tokens is just a worse PDF skill. You want the heavy detail behind a trigger, so it costs nothing until the moment it is useful. That is what progressive disclosure buys, and it is cheap to add: split the file into a short always-available description and a body that loads on demand.

The repos that skip progressive disclosure are not making a deliberate capability trade-off. They are leaving free savings on the table. Measuring the always-resident surface is how you tell the two apart.

What Are the Three Fixes, in Order of Leverage?

Audit what is resident. Run a ~/.claude folder context audit and rank your files by estimated tokens. Remove anything you have not triggered in a month. This is often the single highest-leverage move.
Demand progressive disclosure. Prefer skills whose always-on surface is a short description and whose detail loads on trigger. A well-built skill is light until you use it. If a skill file is one undifferentiated block of several thousand characters, that block is probably sitting in context every session.
Pick skills that publish their cost. You cannot budget what you cannot see. The reason every ClaudeKit install prints a token ledger — and ck tokens <kit> lets you recount at any time — is precisely this problem. Transparency is a product feature, not a marketing claim.

A fourth move worth naming: prefer kits over loose skills. A kit bundles related commands with a published, measured total. When you install EngineerKit with ck install engineer, you get 25 commands, 4 skills, 4 read-only agents, and a printed ledger showing 20,413 measured tokens. You know what you bought. A collection of individually-downloaded free skills gives you none of that visibility. See our detailed comparison of kits vs. free prompt packs for the full breakdown.

How Does ClaudeKit's Ledger Compare?

The point of this post is not to dismiss free repos. Many are genuinely useful, and zero dollars is a real advantage. The point is that "free" hides a cost that is invisible until you measure it.

ClaudeKit's control is the published ledger: every kit carries a token figure measured at pack time with the same chars/4-class counter, so you can see the cost before you install. The token figures by kit:

Kit	Commands	Skills	Agents	Measured tokens
EngineerKit (/eng)	25	4	4	20,413
MarketingKit (/mkt)	20	3	2	16,714
SEOKit (/seo)	19	4	2	16,004
EcomKit (/ecom)	20	3	2	16,464
VideoKit (/video)	17	5	3	12,602
Total	101	19	13	82,197

These are not marketing numbers. They are the same kind of measurement we applied to the public files above, turned on ourselves. The difference is not that our skills are magically smaller. It is that they are measured, labeled, and built to load progressively — the always-resident surface stays small and the heavy bodies arrive only when a command triggers them.

The v2 architecture also eliminates a class of token waste that plagued v1: orchestrator agents and blocking reviewer gates. Every command in the current kits ends with evidence — a diff, a report, a verified file — not a call to a downstream reviewer agent that loads its own context and waits for approval. That architecture change alone removed significant always-resident overhead. For more on why we made that call, see our post on token costs in Claude Code skills, measured.

Installing a kit takes two steps: ck auth <key> then ck install <kit>. The token ledger prints automatically. You can also install via the plugin marketplace: /plugin marketplace add Madni-Aghadi/claudekit-engineer. Use ck doctor to diagnose loading issues and ck list to see your active entitlements.

How Does This Relate to the Broader AI Efficiency Problem?

Context waste is not unique to skills. It is a version of the same problem showing up in AI-driven work at every layer. Perplexity's citation data shows 90% of top-cited sources answer in the first 100 words — the early-page content is what gets used, and the rest is carried for free. Token waste in skills is the same dynamic: most of the loaded content is never consulted in a given session, but you pay to carry it anyway.

For a strategic read on how this connects to being found and cited by AI search engines, see AEO vs SEO: Optimizing for AI Answers. The structural insight — that original, measured data formatted for early-page citation outperforms padded content — applies both to how you should write and to which tools you should load in context.

The 2026 skills ecosystem is large enough now (roughly 90,000 skills on skills.sh as of mid-2026) that visibility into cost is the differentiator. Most skills do not tell you what they cost. Measuring it yourself, or choosing tools that measure it for you, is increasingly the skill that matters.

FAQ

How accurate is the chars/4 token estimate?

It is a deliberate approximation — roughly four characters per token for English prose. Real tokenization depends on the specific tokenizer and content type; code, tables, and punctuation tokenize differently from plain text. We use it to identify heavy files and illustrate a pattern, not to produce numbers anyone should bill against. Every figure in this post carries that caveat explicitly.

What is progressive disclosure in the context of Claude Code skills?

Progressive disclosure is the design where a skill's always-available surface is a short description and its full body loads only when a command triggers it. It keeps large skill libraries affordable: the skill is light until you actually invoke it. Skills that skip this — one large always-resident block — make you pay for detail you are not using in a given session, every session.

Where does the 300,000 tokens per month figure come from?

It is a projection: approximately 3,000 estimated always-loaded tokens you do not use in a typical session, multiplied by about 100 sessions per month. Both inputs are approximations, and your real always-resident set will differ. The figure is meant to show the shape of the waste — always-loaded waste scales linearly with session count — not to provide an exact billing number.

Does ClaudeKit load all 82,197 tokens into context on every session?

No. The 82,197 figure is the measured total across all five kits combined. Individual kit installs load only that kit's commands and skills, and the detail within each command loads on trigger, not on session start. The always-resident surface of a single kit is substantially smaller than the total measured size. The token ledger that prints on install shows the full kit size so you know what you bought; ck tokens <kit> lets you recount at any time.

How do I audit my own ~/.claude folder for token waste?

The fastest approach is to list every file in ~/.claude and run a character-count-divided-by-four estimate on each one. Sort by estimated tokens descending. Any large file you have not triggered recently is a candidate for removal. We wrote a detailed walkthrough of this process in the ~/.claude folder context audit post. The ck doctor command in ClaudeKit v0.1.3 also diagnoses loading issues if you are using the CLI.

Are free skills ever worth using despite the token cost?

Yes. A well-designed free skill with a short always-on surface and triggered detail can be genuinely efficient. The problem is not free skills as a category — it is free skills that lack progressive disclosure and have no published token footprint. If you cannot see what a skill costs before installing it, you are making a blind bet. That bet is low-stakes for occasional users and increasingly material for teams running hundreds of sessions a month.

If you are spending real time on engineering workflows, the token efficiency argument for EngineerKit is straightforward: 25 commands with a 20,413-token measured ledger, progressively loaded, with a daily eight-command workflow (catchup, plan, tdd, debug, verify, review, commit, handoff) that covers the full development loop. Single kit pricing starts at $14.99 per month. The ledger prints on install, ck tokens engineer recounts it, and there is a 14-day refund window if the math does not work for you.