wshobson/agents Review: 192 Free Agents, Honestly Assessed (2026)

wshobson/agents is the best free Claude Code agent repository for software engineers. It packs 192 agents across 84 isolated plugins, ships a sensible four-tier model assignment (Opus down to Haiku), and exports everything to five AI harnesses from a single Markdown source. Its honest limits are quality variance across that scale, no per-component token measurement, and breadth over vertical depth. If your work is code, grab it. If your work is a specific business function you need to ship reliably, a vertical kit solves a different problem.

A disclosure up front: we build ClaudeKit, a paid set of vertical kits, so we have a stake in this comparison. We've tried to keep the assessment about architecture and trade-offs, and we'll say plainly where wshobson/agents is the better choice.

What is wshobson/agents, exactly?

The repo is a marketplace of plugins, not a flat folder of 192 agent files. Each plugin is self-contained — it bundles its own agents, commands, and skills, auto-discovered from directory structure — and you install plugins individually. That single architectural decision drives most of what works:

Context isolation. Installing one plugin loads only that plugin's components. With Claude Code burning context budget on every loaded agent and skill, a monolithic 192-agent install would be unusable. Plugin boundaries make the scale tractable.

Multi-harness export. One Markdown source exports harness-native artifacts to Claude Code, Cursor, Codex CLI, OpenCode, and Gemini CLI. No lowest-common-denominator translation. If you live across multiple AI coding tools, that portability is rare.

Composability. Pull the Python plugin and the security plugin without inheriting unrelated frontend tooling. Mix and match as your project demands.

This is mature design. It reflects real thought about how agents are consumed in practice, and it's the reason the repo scales to 192 agents without collapsing under its own context weight.

How does the four-tier model assignment work?

One of wshobson/agents' most useful features is that it doesn't run everything on one model. With Opus 4.8 pricing at $5/$25 per million tokens (shipped May 2026), the difference between tiers is not trivial at scale. The tiering strategy:

Tier	Model	Use Cases
1	Opus	Architecture reviews, security audits, production-critical reasoning
2	User-selected	Backend, frontend, specialized domain work
3	Sonnet	Documentation, testing, API references
4	Haiku	Operational tasks, deployments, content formatting

This is the right instinct. Match cognitive load to model cost. Paying Opus rates to format a config file is the single most common cost mistake in Claude Code setups — we wrote a whole guide on model tiering because getting the tier boundary wrong is where context budgets quietly bleed. wshobson/agents gets the shape correct. Whether each of the 192 agents hits the right tier is a per-agent question, and we won't pretend to have audited all of them.

Which plugin categories are strongest?

Based on the repo's organization and the author's evident center of gravity, the strongest areas are the ones closest to software engineering:

Architecture and infrastructure — the Opus-tier architecture agents are clearly the flagship. Putting the heavyweight model on system design is the correct call, and the depth here is the main reason engineers reach for this repo first.
Programming languages and frameworks — dense Python, Django, FastAPI coverage. For a working backend engineer, this is the draw.
Security and compliance — security reviews are explicitly Tier 1 (Opus). That's the correct assignment for that cognitive load and a notable strength versus repos that run everything on one model.
Data engineering and ML — broad coverage, solid for day-to-day data work.
DevOps and infrastructure — deployment, CI/CD, and operational agents at Tier 4 (Haiku) where the cost-to-value is good.

The categories that exist but lean broad rather than deep are the business-adjacent ones: SEO, documentation, business operations. Competent general-purpose agents, but they improvise from a persona rather than chain a domain workflow end to end. That's a scope statement, not a knock. This is an engineer's toolkit that also has business agents.

What are the honest trade-offs at 192 agents?

A free repo of this scale has predictable trade-offs. Here they are plainly:

Quality variance. When one maintainer ships 192 agents, they will not all be equally polished. Some are clearly load-bearing and well-tuned; others are thinner. There is no published per-agent quality floor. In ClaudeKit's v2 architecture, we moved away from blocking reviewer gates entirely — commands now end with EVIDENCE (a report, a diff, a verified file) rather than a gate agent deciding whether the output is good enough. But with wshobson/agents, the absence of any quality signal at all means you are the reviewer. For engineers checking their own code, that's fine. For non-technical operators shipping customer-facing deliverables, it's a real gap.

No per-component token measurement. The plugin architecture isolates context well, but the repo doesn't publish the token footprint per agent or skill. You discover costs by loading components, not by reading a number before you install. We made measured token costs central to ClaudeKit — every kit install prints a token ledger, ck tokens <kit> recounts at any time, and the five kits together total 82,197 measured tokens across 101 commands, 19 skills, and 13 agents. That discipline is absent from wshobson/agents, which is an acceptable omission for a free repo but real friction for cost-sensitive production workflows.

Breadth over vertical depth. 192 agents across many domains is horizontal breadth. A vertical kit — say, 20 ecommerce commands chaining store triage, cart recovery, margins, ads, and BFCM prep with shared context — is depth in one lane. wshobson/agents gives you a wide bench of specialists that improvise independently. It does not give you an orchestrated, context-carrying workflow for a specific business function. Which you want depends on your job.

When should you choose wshobson/agents over a paid kit?

We'll be direct. An honest comparison is more useful than a sales pitch.

Choose wshobson/agents when:

You're a software engineer who wants free, deep coverage of development, infrastructure, and security agents
You work across multiple AI coding harnesses and value single-source multi-harness export
You're comfortable being your own quality reviewer
Your work is in code, not customer-facing business deliverables
You want to understand the plugin architecture pattern before deciding whether to pay for anything

Consider a paid vertical kit instead when:

Your job is a specific business function — SEO operations, ecommerce lifecycle, video production, marketing content — where you want an orchestrated end-to-end workflow, not a bench of improvising generalists
You need measured token costs per component before running anything, to budget context deliberately
You want commands that end with verifiable EVIDENCE (a diff, a report, a validated file) rather than an LLM judgment call
You're shipping output to customers or investors and need reproducible quality

Both can be true simultaneously. A reasonable setup is wshobson/agents for your engineering work and a vertical kit for the one business function you actually need to ship reliably. They're not mutually exclusive, and we'd never claim otherwise.

How does wshobson/agents compare to ClaudeKit's five kits?

The cleanest frame: wshobson/agents is horizontal and free — broad, well-architected, strongest in engineering, you supply the quality control. ClaudeKit is vertical and paid — five kits that each chain a single business function end to end, with commands that produce EVIDENCE rather than opinions, per-component token measurement, and install-time token ledgers.

Here's the side-by-side on the dimensions that actually matter for a decision:

Dimension	wshobson/agents	ClaudeKit v2
Price	Free	$14.99/mo single kit to $49.99/mo All-Access
Agent count	192 agents, 84 plugins	13 read-only specialist agents across 5 kits
Command count	102 commands	101 commands (25 eng + 20 mkt + 17 video + 19 seo + 20 ecom)
Token measurement	Not published	82,197 measured tokens total; printed on install
Harness support	Claude Code, Cursor, Codex, OpenCode, Gemini	Claude Code (plugin marketplace + CLI)
Model tiering	4-tier (Opus to Haiku)	Per-command, matches task load
Vertical depth	Broad across many domains	Deep per business function
Quality signal	You review	Commands end with EVIDENCE (report/diff/file)
Install	Clone + configure	`ck auth <key>` then `ck install <kit>`

ClaudeKit's five kits in brief:

EngineerKit — 25 commands, 4 skills, 4 agents, 20,413 tokens. Flagship: /eng debug (root-cause-first diagnosis). Daily eight: catchup, plan, tdd, debug, verify, review, commit, handoff.
MarketingKit — 20 commands, 3 skills, 2 agents, 16,714 tokens. Flagships: /mkt voice (voice file from your real posts) + /mkt humanize (strips 14 AI tells).
VideoKit — 17 commands, 5 skills, 3 agents, 12,602 tokens. Flagship: /video clone (recreate a reference video's style in Remotion and verify the match).
SEOKit — 19 commands, 4 skills, 2 agents, 16,004 tokens. Flagships: /seo quick-wins (positions 8-20 plus low-CTR pages) and /seo citations (AI-citation measurement with confidence intervals).
EcomKit — 20 commands, 3 skills, 2 agents, 16,464 tokens. Flagship: /ecom no-sales (store triage against AOV-band benchmarks).

For the broader question of how the plugin model stacks up against kit-style packaging, our plugins vs kits post covers the marketplace model directly. And if you're still sorting out what agents, skills, and commands actually are and how they differ, the agents vs skills vs slash commands post is worth ten minutes.

What does the ClaudeKit v2 agent architecture look like?

This comes up when people read old ClaudeKit coverage that mentions "blocking reviewer gates" and "quality-gate agents." That was v1. In v2, we killed the reviewer gate.

The v1 pattern was: command runs, then a reviewer agent decides if the output is good enough, blocks if it scores below a threshold. The problem is that a reviewer agent is itself an LLM — it adds latency, burns tokens, and can be wrong about whether the output is good. You've replaced one LLM judgment with another.

V2 architecture is different: every command ends with EVIDENCE, not an opinion. /eng debug ends with a root-cause report and a diff. /seo quick-wins ends with a ranked table of specific URLs and their position/CTR numbers pulled from real data. /ecom no-sales ends with a triage checklist scored against benchmarks. You don't need a reviewer to tell you whether the output is good — you can read the diff, check the numbers, verify the file. The 13 agents across ClaudeKit v2 are all read-only specialists: auditors, researchers, reviewers that read and summarize rather than blocking execution.

wshobson/agents doesn't publish a quality-gate architecture at all, which for most engineering use cases is fine — engineers review their own diffs. The comparison point is that ClaudeKit's evidence-first design is not about gatekeeping, it's about making the output verifiable without adding another LLM in the loop.

How do you install and manage ClaudeKit alongside wshobson/agents?

The two don't conflict. ClaudeKit installs globally to ~/.claude by default, or --local for a specific project. wshobson/agents plugins install into your project directory. They coexist cleanly in Claude Code's context loading.

ClaudeKit install flow:

ck auth <key> — authenticates your license (3 devices per license)
ck install <kit> — installs to ~/.claude, prints token ledger on completion
/plugin marketplace add Madni-Aghadi/claudekit-<kit> — alternative if you prefer the Claude Code plugin marketplace

Maintenance commands: ck tokens <kit> recounts the token footprint, ck doctor diagnoses config issues, ck list shows your entitlements.

Pricing: $14.99/mo for a single kit, $29.99/mo Pro (any 3, swap 1 per cycle), $49.99/mo All-Access. Annual plans at $119/$239/$399. One-time lifetime per kit at $99 as-shipped (no future updates included). 14-day refunds. Full pricing at /pricing.

Bottom line

wshobson/agents is the best free agent repository for software engineers, full stop. Its plugin architecture is mature, its four-tier model strategy is the right instinct, and its multi-harness export is genuinely useful for people who work across Claude Code, Cursor, and Gemini. Its honest limits are quality variance at 192 agents, no per-component token measurement, and breadth over vertical depth in business functions.

Pick it for engineering work. Consider a vertical kit for a business function you need to ship reliably. Often, pick both — they serve different jobs and don't conflict.

If the business function you actually need to ship is SEO, ecommerce operations, video production, or marketing content, the relevant ClaudeKit page (/seo, /ecom, /video, /marketing) walks through what the commands produce and what the token costs are before you buy anything.

FAQ

Does wshobson/agents really have 192 agents?

Yes — 192 agents across 84 plugins, alongside commands and skills, exported to five AI harnesses. We did not install and benchmark all 192 (no honest reviewer does at that scale), so this is an architectural assessment of how they're organized and tiered, not a per-agent performance test. The plugin boundary is what makes that scale usable: you install only the plugins you need rather than loading all 84 into context.

Is wshobson/agents better than a paid kit?

For software engineering, it's excellent and free, and hard to beat on breadth. For a business function you need to ship reliably — SEO operations, ecommerce lifecycle, video production, marketing content — a vertical kit's orchestrated workflow and evidence-based command outputs solve a different problem. It's not "better or worse"; it's horizontal-and-free versus vertical-and-paid. They're often used together.

Does wshobson/agents publish token costs per agent?

No. The plugin architecture isolates context so you only load what you install, which helps — but the repo doesn't publish the per-agent or per-skill token footprint. You discover costs by loading components. Measured per-component cost with a printed install ledger is something we made central to ClaudeKit v2 (82,197 total tokens across 5 kits, countable with ck tokens <kit>). It's an acceptable omission for a free repo but real friction for cost-sensitive production workflows.

What happened to the blocking reviewer gate in ClaudeKit?

We removed it in v2. The v1 pattern used a reviewer agent to block output scoring below a threshold. The problem: a reviewer agent is itself an LLM, adds latency, burns tokens, and can be wrong. V2 commands end with EVIDENCE — a diff, a ranked table, a verified file — that you can read and evaluate directly. The 13 agents in v2 are all read-only specialists (auditors, researchers, reviewers) rather than blocking gates.

Can I use wshobson/agents and ClaudeKit at the same time?

Yes. They install to different directories and don't conflict in Claude Code. A common setup is wshobson/agents for engineering plugins and ClaudeKit for one business-function kit. ck doctor will flag any config conflicts if they appear.

How does ClaudeKit's v2 architecture differ from v1?

V1 had FounderKit, SalesKit, and a reviewer/quality-gate agent architecture where commands triggered a blocking reviewer before emitting output. V2 has 5 kits (EngineerKit, MarketingKit, VideoKit, SEOKit, EcomKit), 101 commands, 19 skills, and 13 read-only agents. FounderKit and SalesKit are shelved. The reviewer gate is gone — commands end with verifiable EVIDENCE. Token measurement is printed on every install. Namespaces changed from /marketing /founder /sales to /eng /mkt /video /seo /ecom.