wshobson/agents Review: 192 Free Agents, Honestly Assessed (2026)
wshobson/agents has 192 agents, 84 plugins, four model tiers, and multi-harness export. An honest architecture review and when to pay for a vertical kit instead.

wshobson/agents is the best free Claude Code agent repository for software engineers. It packs 192 agents across 84 isolated plugins, ships a sensible four-tier model assignment (Opus down to Haiku), and exports everything to five AI harnesses from a single Markdown source. Its honest limits are quality variance across that scale, no per-component token measurement, and breadth over vertical depth. If your work is code, grab it. If your work is a specific business function you need to ship reliably, a vertical kit solves a different problem.
A disclosure up front: we build ClaudeKit, a paid set of vertical kits, so we have a stake in this comparison. We've tried to keep the assessment about architecture and trade-offs, and we'll say plainly where wshobson/agents is the better choice.
What is wshobson/agents, exactly?
The repo is a marketplace of plugins, not a flat folder of 192 agent files. Each plugin is self-contained — it bundles its own agents, commands, and skills, auto-discovered from directory structure — and you install plugins individually. That single architectural decision drives most of what works:
Context isolation. Installing one plugin loads only that plugin's components. With Claude Code burning context budget on every loaded agent and skill, a monolithic 192-agent install would be unusable. Plugin boundaries make the scale tractable.
Multi-harness export. One Markdown source exports harness-native artifacts to Claude Code, Cursor, Codex CLI, OpenCode, and Gemini CLI. No lowest-common-denominator translation. If you live across multiple AI coding tools, that portability is rare.
Composability. Pull the Python plugin and the security plugin without inheriting unrelated frontend tooling. Mix and match as your project demands.
This is mature design. It reflects real thought about how agents are consumed in practice, and it's the reason the repo scales to 192 agents without collapsing under its own context weight.
How does the four-tier model assignment work?
One of wshobson/agents' most useful features is that it doesn't run everything on one model. With Opus 4.8 pricing at $5/$25 per million tokens (shipped May 2026), the difference between tiers is not trivial at scale. The tiering strategy:
| Tier | Model | Use Cases |
|---|---|---|
| 1 | Opus | Architecture reviews, security audits, production-critical reasoning |
| 2 | User-selected | Backend, frontend, specialized domain work |
| 3 | Sonnet | Documentation, testing, API references |
| 4 | Haiku | Operational tasks, deployments, content formatting |
This is the right instinct. Match cognitive load to model cost. Paying Opus rates to format a config file is the single most common cost mistake in Claude Code setups — we wrote a whole guide on model tiering because getting the tier boundary wrong is where context budgets quietly bleed. wshobson/agents gets the shape correct. Whether each of the 192 agents hits the right tier is a per-agent question, and we won't pretend to have audited all of them.
Which plugin categories are strongest?
Based on the repo's organization and the author's evident center of gravity, the strongest areas are the ones closest to software engineering:
- Architecture and infrastructure — the Opus-tier architecture agents are clearly the flagship. Putting the heavyweight model on system design is the correct call, and the depth here is the main reason engineers reach for this repo first.
- Programming languages and frameworks — dense Python, Django, FastAPI coverage. For a working backend engineer, this is the draw.
- Security and compliance — security reviews are explicitly Tier 1 (Opus). That's the correct assignment for that cognitive load and a notable strength versus repos that run everything on one model.
- Data engineering and ML — broad coverage, solid for day-to-day data work.
- DevOps and infrastructure — deployment, CI/CD, and operational agents at Tier 4 (Haiku) where the cost-to-value is good.
The categories that exist but lean broad rather than deep are the business-adjacent ones: SEO, documentation, business operations. Competent general-purpose agents, but they improvise from a persona rather than chain a domain workflow end to end. That's a scope statement, not a knock. This is an engineer's toolkit that also has business agents.
What are the honest trade-offs at 192 agents?
A free repo of this scale has predictable trade-offs. Here they are plainly:
Quality variance. When one maintainer ships 192 agents, they will not all be equally polished. Some are clearly load-bearing and well-tuned; others are thinner. There is no published per-agent quality floor. In ClaudeKit's v2 architecture, we moved away from blocking reviewer gates entirely — commands now end with EVIDENCE (a report, a diff, a verified file) rather than a gate agent deciding whether the output is good enough. But with wshobson/agents, the absence of any quality signal at all means you are the reviewer. For engineers checking their own code, that's fine. For non-technical operators shipping customer-facing deliverables, it's a real gap.
No per-component token measurement. The plugin architecture isolates context well, but the repo doesn't publish the token footprint per agent or skill. You discover costs by loading components, not by reading a number before you install. We made measured token costs central to ClaudeKit — every kit install prints a token ledger, ck tokens <kit> recounts at any time, and the five kits together total 82,197 measured tokens across 101 commands, 19 skills, and 13 agents. That discipline is absent from wshobson/agents, which is an acceptable omission for a free repo but real friction for cost-sensitive production workflows.
Breadth over vertical depth. 192 agents across many domains is horizontal breadth. A vertical kit — say, 20 ecommerce commands chaining store triage, cart recovery, margins, ads, and BFCM prep with shared context — is depth in one lane. wshobson/agents gives you a wide bench of specialists that improvise independently. It does not give you an orchestrated, context-carrying workflow for a specific business function. Which you want depends on your job.
When should you choose wshobson/agents over a paid kit?
We'll be direct. An honest comparison is more useful than a sales pitch.
Choose wshobson/agents when:
- You're a software engineer who wants free, deep coverage of development, infrastructure, and security agents
- You work across multiple AI coding harnesses and value single-source multi-harness export
- You're comfortable being your own quality reviewer
- Your work is in code, not customer-facing business deliverables
- You want to understand the plugin architecture pattern before deciding whether to pay for anything
Consider a paid vertical kit instead when:
- Your job is a specific business function — SEO operations, ecommerce lifecycle, video production, marketing content — where you want an orchestrated end-to-end workflow, not a bench of improvising generalists
- You need measured token costs per component before running anything, to budget context deliberately
- You want commands that end with verifiable EVIDENCE (a diff, a report, a validated file) rather than an LLM judgment call
- You're shipping output to customers or investors and need reproducible quality
Both can be true simultaneously. A reasonable setup is wshobson/agents for your engineering work and a vertical kit for the one business function you actually need to ship reliably. They're not mutually exclusive, and we'd never claim otherwise.
How does wshobson/agents compare to ClaudeKit's five kits?
The cleanest frame: wshobson/agents is horizontal and free — broad, well-architected, strongest in engineering, you supply the quality control. ClaudeKit is vertical and paid — five kits that each chain a single business function end to end, with commands that produce EVIDENCE rather than opinions, per-component token measurement, and install-time token ledgers.
Here's the side-by-side on the dimensions that actually matter for a decision:
| Dimension | wshobson/agents | ClaudeKit v2 |
|---|---|---|
| Price | Free | $14.99/mo single kit to $49.99/mo All-Access |
| Agent count | 192 agents, 84 plugins | 13 read-only specialist agents across 5 kits |
| Command count | 102 commands | 101 commands (25 eng + 20 mkt + 17 video + 19 seo + 20 ecom) |
| Token measurement | Not published | 82,197 measured tokens total; printed on install |
| Harness support | Claude Code, Cursor, Codex, OpenCode, Gemini | Claude Code (plugin marketplace + CLI) |
| Model tiering | 4-tier (Opus to Haiku) | Per-command, matches task load |
| Vertical depth | Broad across many domains | Deep per business function |
| Quality signal | You review | Commands end with EVIDENCE (report/diff/file) |
| Install | Clone + configure | ck auth <key> then ck install <kit> |
ClaudeKit's five kits in brief:
- EngineerKit — 25 commands, 4 skills, 4 agents, 20,413 tokens. Flagship:
/eng debug(root-cause-first diagnosis). Daily eight: catchup, plan, tdd, debug, verify, review, commit, handoff. - MarketingKit — 20 commands, 3 skills, 2 agents, 16,714 tokens. Flagships:
/mkt voice(voice file from your real posts) +/mkt humanize(strips 14 AI tells). - VideoKit — 17 commands, 5 skills, 3 agents, 12,602 tokens. Flagship:
/video clone(recreate a reference video's style in Remotion and verify the match). - SEOKit — 19 commands, 4 skills, 2 agents, 16,004 tokens. Flagships:
/seo quick-wins(positions 8-20 plus low-CTR pages) and/seo citations(AI-citation measurement with confidence intervals). - EcomKit — 20 commands, 3 skills, 2 agents, 16,464 tokens. Flagship:
/ecom no-sales(store triage against AOV-band benchmarks).
For the broader question of how the plugin model stacks up against kit-style packaging, our plugins vs kits post covers the marketplace model directly. And if you're still sorting out what agents, skills, and commands actually are and how they differ, the agents vs skills vs slash commands post is worth ten minutes.
What does the ClaudeKit v2 agent architecture look like?
This comes up when people read old ClaudeKit coverage that mentions "blocking reviewer gates" and "quality-gate agents." That was v1. In v2, we killed the reviewer gate.
The v1 pattern was: command runs, then a reviewer agent decides if the output is good enough, blocks if it scores below a threshold. The problem is that a reviewer agent is itself an LLM — it adds latency, burns tokens, and can be wrong about whether the output is good. You've replaced one LLM judgment with another.
V2 architecture is different: every command ends with EVIDENCE, not an opinion. /eng debug ends with a root-cause report and a diff. /seo quick-wins ends with a ranked table of specific URLs and their position/CTR numbers pulled from real data. /ecom no-sales ends with a triage checklist scored against benchmarks. You don't need a reviewer to tell you whether the output is good — you can read the diff, check the numbers, verify the file. The 13 agents across ClaudeKit v2 are all read-only specialists: auditors, researchers, reviewers that read and summarize rather than blocking execution.
wshobson/agents doesn't publish a quality-gate architecture at all, which for most engineering use cases is fine — engineers review their own diffs. The comparison point is that ClaudeKit's evidence-first design is not about gatekeeping, it's about making the output verifiable without adding another LLM in the loop.
How do you install and manage ClaudeKit alongside wshobson/agents?
The two don't conflict. ClaudeKit installs globally to ~/.claude by default, or --local for a specific project. wshobson/agents plugins install into your project directory. They coexist cleanly in Claude Code's context loading.
ClaudeKit install flow:
ck auth <key>— authenticates your license (3 devices per license)ck install <kit>— installs to~/.claude, prints token ledger on completion/plugin marketplace add Madni-Aghadi/claudekit-<kit>— alternative if you prefer the Claude Code plugin marketplace
Maintenance commands: ck tokens <kit> recounts the token footprint, ck doctor diagnoses config issues, ck list shows your entitlements.
Pricing: $14.99/mo for a single kit, $29.99/mo Pro (any 3, swap 1 per cycle), $49.99/mo All-Access. Annual plans at $119/$239/$399. One-time lifetime per kit at $99 as-shipped (no future updates included). 14-day refunds. Full pricing at /pricing.
Bottom line
wshobson/agents is the best free agent repository for software engineers, full stop. Its plugin architecture is mature, its four-tier model strategy is the right instinct, and its multi-harness export is genuinely useful for people who work across Claude Code, Cursor, and Gemini. Its honest limits are quality variance at 192 agents, no per-component token measurement, and breadth over vertical depth in business functions.
Pick it for engineering work. Consider a vertical kit for a business function you need to ship reliably. Often, pick both — they serve different jobs and don't conflict.
If the business function you actually need to ship is SEO, ecommerce operations, video production, or marketing content, the relevant ClaudeKit page (/seo, /ecom, /video, /marketing) walks through what the commands produce and what the token costs are before you buy anything.
FAQ
Does wshobson/agents really have 192 agents?
Yes — 192 agents across 84 plugins, alongside commands and skills, exported to five AI harnesses. We did not install and benchmark all 192 (no honest reviewer does at that scale), so this is an architectural assessment of how they're organized and tiered, not a per-agent performance test. The plugin boundary is what makes that scale usable: you install only the plugins you need rather than loading all 84 into context.
Is wshobson/agents better than a paid kit?
For software engineering, it's excellent and free, and hard to beat on breadth. For a business function you need to ship reliably — SEO operations, ecommerce lifecycle, video production, marketing content — a vertical kit's orchestrated workflow and evidence-based command outputs solve a different problem. It's not "better or worse"; it's horizontal-and-free versus vertical-and-paid. They're often used together.
Does wshobson/agents publish token costs per agent?
No. The plugin architecture isolates context so you only load what you install, which helps — but the repo doesn't publish the per-agent or per-skill token footprint. You discover costs by loading components. Measured per-component cost with a printed install ledger is something we made central to ClaudeKit v2 (82,197 total tokens across 5 kits, countable with ck tokens <kit>). It's an acceptable omission for a free repo but real friction for cost-sensitive production workflows.
What happened to the blocking reviewer gate in ClaudeKit?
We removed it in v2. The v1 pattern used a reviewer agent to block output scoring below a threshold. The problem: a reviewer agent is itself an LLM, adds latency, burns tokens, and can be wrong. V2 commands end with EVIDENCE — a diff, a ranked table, a verified file — that you can read and evaluate directly. The 13 agents in v2 are all read-only specialists (auditors, researchers, reviewers) rather than blocking gates.
Can I use wshobson/agents and ClaudeKit at the same time?
Yes. They install to different directories and don't conflict in Claude Code. A common setup is wshobson/agents for engineering plugins and ClaudeKit for one business-function kit. ck doctor will flag any config conflicts if they appear.
How does ClaudeKit's v2 architecture differ from v1?
V1 had FounderKit, SalesKit, and a reviewer/quality-gate agent architecture where commands triggered a blocking reviewer before emitting output. V2 has 5 kits (EngineerKit, MarketingKit, VideoKit, SEOKit, EcomKit), 101 commands, 19 skills, and 13 read-only agents. FounderKit and SalesKit are shelved. The reviewer gate is gone — commands end with verifiable EVIDENCE. Token measurement is printed on every install. Namespaces changed from /marketing /founder /sales to /eng /mkt /video /seo /ecom.
Give Claude Code a real team
Five kits, 101 commands, every token measured. Pick the team that matches your work and install it in five minutes.
See the kits

