YouTube Scripts with Claude Code: Hook, Retention, CTA Architecture

A YouTube script that holds an audience is not written front to back — it is architected around three structural decisions: a first-30-seconds hook that earns the next minute, a retention curve with deliberate re-hooks every 40-60 seconds, and CTA placement that asks at the right moments instead of begging at the end.

VideoKit's /video make command builds scripts this way inside Claude Code: it pulls a current angle, drafts the hook and full script against a plotted retention map, places the end-screen CTA, and tightens pacing — all in your cloned voice. The result is a finished script file plus thumbnail brief, SEO description, and chapter markers in roughly one run.

This post breaks down the hook/retention/CTA architecture, explains the creator-voice bootstrap that keeps output sounding like you, and gives an honest take on where AI-written scripts help and where they cannot replace creator authenticity.

Why does the first 30 seconds decide everything?

On YouTube, the opening seconds determine whether the video gets a chance at all. The early retention graph is the signal the algorithm watches hardest — a hook that fails to hold audience drops the video from recommendations fast.

VideoKit treats the hook as a distinct phase, not a throwaway intro. The /video make command opens with hook generation before touching anything else. The three structural components that matter:

Stakes — why this matters now, to this viewer. Not "today we're talking about X" but "if you're doing X, you're probably losing Y."
Pattern interrupt — something that breaks the expected rhythm in the first few seconds so the viewer does not bounce.
Payoff promise — a clear, specific reason to stay ("by the end you'll have Z"), an open loop the script closes later.

The hook is co-generated with the title set, because the title sets the promise the hook must deliver on. Writing the script first and bolting a title on after is one of the most common structural failures in creator workflows. VideoKit writes them together so they cannot contradict each other.

VideoKit also runs /video data before scripting: a research pass that benchmarks retention curves, hook formats, and chapter lengths from high-performing videos in the same niche. The script is built against what already works in your category, not against generic best practices.

How does a plotted retention curve prevent mid-video drop-off?

The middle of a video is where most retention dies. The viewer got the promise, started to drift, and left. The structural fix is to plot the retention curve in advance and place re-engagement deliberately — not hope the content keeps people watching.

VideoKit's retention-mapping step does exactly this: it plots the retention curve and places re-hooks and pattern interrupts every 40-60 seconds. The elements placed on the curve:

Open loops — questions or promises opened early and paid off later, pulling the viewer forward.
Re-hooks — a fresh stakes-statement or a "here's the part most people miss" every 40-60 seconds to reset attention.
Pattern interrupts — a change in pace, format, B-roll cue, or framing that prevents the monotony that causes drop-off.

The retention map is not decoration. It is the skeleton the script is written onto. The full script skill (yt-script-full) writes the long-form draft — intro, value beats, B-roll cues, and CTA — against that plotted curve, so re-hooks land where the map says retention would otherwise sag. For a faster pass, an outline step produces just the skeleton (thesis, sections, open loops, payoff order) before committing to the full draft.

After the draft, a script-doctor pass tightens pacing, cuts verbosity, and confirms re-hooks are landing at the right intervals. This is a Sonnet-tier editing pass, not a full re-draft — it is fast and focused on the structural layer.

Where and when should a CTA actually be placed?

A CTA dumped at the very end, after most viewers have already left, converts almost no one. VideoKit handles calls-to-action structurally, mapping end-screen and CTA strategy to the points where viewers are actually still watching.

The architecture distributes asks across the script:

An early, soft subscribe ask after the hook has delivered its first payoff — the viewer now trusts the channel enough to hear it.
A mid-roll next-step tied to a value beat ("if this helped, the full breakdown is linked below").
An end-screen pointing to the next video first (to keep session time), followed by the subscribe prompt, timed to the last retained segment.

The SEO description is generated alongside the script: timestamps, keywords, chapter markers, and the primary CTA reinforced in the first 150 characters where YouTube truncates. Chapter timestamps are generated from the script transcript so the CTA strategy is supported by a discoverable, navigable description.

This structural CTA approach matters more than it used to. With AI Overviews now appearing on 48% of Google queries (March 2026, up from 34.5% in December 2025), a well-structured description with timestamps also feeds AI-generated video summaries and AIO carousels — discoverability is no longer purely algorithmic within YouTube.

What does the /video make command actually run?

Here is the /video make command end to end, with each step and what it produces:

/video make youtube "AI tools for editors"
 
1. /video data — research pass, benchmarks retention/hook/format
   data from high-performing videos in the niche.
2. Hook + title generation — writes 12 title variants ranked by
   CTR potential, then the first-30-second hook against the
   winning title.
3. yt-retention-map — plots the retention curve, places re-hooks
   and pattern interrupts at 40-60s intervals.
4. yt-script-full — writes the full long-form script (hook, value
   beats, B-roll cues, CTA) onto the retention map skeleton.
5. script-doctor pass — tightens pacing, cuts verbosity,
   confirms re-hooks at plotted intervals.
6. yt-thumbnail-brief — focal subject, three-word overlay, emotion,
   A/B variants.
7. yt-description-seo — SEO description with timestamps, keywords,
   chapter markers.
 
Artifacts: yt-<slug>.md, thumbnail-brief.md, description.md

The result is a finished script file plus the supporting assets. The command ends with the deliverable — a verified output file — not a reviewer gate. This is the v2 architecture: commands produce EVIDENCE (a finished, usable artifact) at the end of the run. The /video clone command takes a different angle: given a reference video URL, it recreates that video's structural style in Remotion and verifies the match — useful if you have a format you want to replicate across episodes.

How does it sound like you instead of generic AI?

This is the part that separates a VideoKit script from a generic AI script. The kit clones your voice once and reads it everywhere.

The /video make command reads VOICE.md and BRAND.md before scripting. If you have run MarketingKit's /mkt voice command, VideoKit inherits those files automatically — voice is shared across kits. If not, a lighter voice-bootstrap step interviews you and ingests sample posts to generate the profile. Every script skill reads those files, so vocabulary, cadence, banned phrases, and tone are all enforced from the start.

A voice-match check scores each draft against VOICE.md for tone, cadence, vocabulary, and banned phrases. A script that drifts from your voice gets caught before it ships. This is the bootstrap-context-first pattern described in detail in the commands and orchestration post: voice written once, enforced everywhere.

The voice clone matches style. It does not have your memories or opinions. See the honest take below.

How does VideoKit compare to writing scripts manually or using ChatGPT?

Approach	Hook structure	Retention map	Voice match	CTA placement	Research pass	Time
Manual scripting	You plan it	You plot it	Native	You decide	You run it	2-4 hours
ChatGPT free	None	None	Generic	End only	None	15-20 min
ChatGPT + custom instructions	Partial	None	Partial	End only	None	20-30 min
VideoKit `/video make`	Structured	Plotted, 40-60s	Voice file	Distributed	`/video data`	10-20 min

The comparison is not "AI is better than a skilled creator." The comparison is "AI does the repeatable structural work so a skilled creator can focus on substance." A manual script written by an experienced creator will beat any AI draft. The question is whether you have time to plot a retention map, title 12 variants, and write a SEO description for every upload. Most solo creators do not.

What is the honest take on AI scripts vs. creator authenticity?

We are not going to tell you AI should write your whole channel. Here is the measured version.

Where AI scripting genuinely helps: the architecture. Plotting a retention curve, structuring a hook with stakes and a payoff loop, placing CTAs at retained moments, writing the SEO description and chapters — this is repeatable craft that is tedious to do by hand for every upload. It also removes the blank page problem: a draft you can react to beats a cursor blinking on an empty document.

Where it cannot replace you: the authenticity. Your specific stories, your hot takes, the lived experience that is the actual reason people watch — those are yours. A voice clone matches tone and cadence. It does not have your memories or your opinions.

The best use is AI for structural scaffolding and a fast first draft, you for the substance and the parts only you can say. The script is a draft to perform and rewrite, never a teleprompter to read cold. That is the honest line, and we would rather draw it than pretend the tool replaces the creator.

One data point worth knowing: 44.2% of AI Overview citations come from the first 30% of a page (March 2026 data). The same principle applies to videos — the hook and early retention are where discoverability is won or lost, both in the algorithm and in AI-surfaced video content. VideoKit's architecture is designed around that front-loading logic.

What else does VideoKit include beyond scripting?

VideoKit is the video-focused kit in the ClaudeKit lineup: 17 commands, 5 skills, 3 read-only specialist agents, and 12,602 measured tokens. The full command set:

Command	What it does
`/video make`	Full video package: script, hook, retention map, thumbnail brief, description
`/video clone`	Recreates a reference video's style in Remotion, verifies the match
`/video demo`	Builds a product demo video with Remotion components
`/video caption`	Generates caption files (SRT/VTT) from transcript
`/video data`	Benchmarks retention, hook formats, chapter lengths from niche leaders
`/video social`	Cuts the video into short-form clips for Instagram/TikTok/Shorts
`/video edit`	Applies edits to an existing script or Remotion composition
`/video thumbnail`	Generates thumbnail brief: focal subject, overlay, emotion, A/B variants
`/video repurpose`	One long-form video → 5 formats (short-form, thread, newsletter section, blog, audiogram)
`/video render`	Triggers Remotion render with props, returns output path
`/video new`	Bootstraps a new Remotion composition from a template
`/video setup`	Installs VideoKit, connects Remotion, writes voice/brand files

The three read-only agents are specialist roles — a researcher, a retention auditor, and a Remotion component reviewer. They audit and report; they do not gate or block. Commands end with a deliverable, not a reviewer loop.

For Remotion-based video generation, the data-driven video post covers how VideoKit's /video data and /video render commands work at scale — including a 200-render benchmark. For product demos specifically, the Remotion product demo post walks through a full /video demo run.

How do I install VideoKit?

# Install CLI
npm install -g claudekits@0.1.3
 
# Authenticate
ck auth <your-key>
 
# Install VideoKit globally
ck install video
 
# Or install to project only
ck install video --local
 
# Verify token counts
ck tokens video
 
# Diagnose issues
ck doctor

A token ledger prints on every install showing the exact measured cost per command and skill. ck tokens video recounts if you want a fresh measurement. ck list shows your active entitlements.

VideoKit is also available via the Claude Code plugin marketplace:

/plugin marketplace add Madni-Aghadi/claudekit-video

Pricing: $14.99/month for VideoKit alone, $29.99/month for Pro (any 3 kits, swap 1 per cycle), $49.99/month All-Access. Annual plans run $119/$239/$399. One-time lifetime per kit is $99 (as shipped, no future updates). 14-day refunds, 3 devices per license. Full details on the pricing page.

If you are producing YouTube content regularly, VideoKit removes the repeatable structural work — hook architecture, retention curve plotting, CTA placement, title testing, SEO descriptions — and hands it back to you as a draft built in your voice. The parts that are irreducibly yours (the stories, the opinions, the performance) stay yours. See the full command set and install instructions on the VideoKit page.

FAQ

Can Claude Code write a full YouTube script?

Yes. VideoKit's /video make command writes a complete long-form script: hook, value beats, B-roll cues, and CTA, plotted onto a retention map with re-hooks every 40-60 seconds. It also produces the title set, thumbnail brief, SEO description, and chapter timestamps. The output is a usable draft file, not a rough sketch. You still perform and personalize it.

How does it make the script sound like me and not generic AI?

VideoKit reads VOICE.md before scripting — a profile built from your real posts that captures vocabulary, cadence, tone, and banned phrases. A voice-match check scores each draft against that profile and flags drift before it ships. If you have run /mkt voice from MarketingKit, VideoKit inherits the same file. The voice clone matches style; it cannot supply your specific stories and opinions, which is the part you bring.

Should I just read the AI script on camera?

No. Use the script for its architecture (hook structure, retention curve, CTA placement) and as a fast first draft, then rewrite it in your real voice and layer in the stories and takes only you have. The tool removes the blank-page and structural-planning work. The authenticity that makes people subscribe is irreducibly yours. Treat it as a draft to perform, not a teleprompter to read cold.

What does generating a full YouTube package cost in Claude tokens?

VideoKit has 12,602 measured tokens total across all 17 commands and 5 skills. A single /video make run — script, hook, retention map, thumbnail brief, SEO description — uses a subset of that budget depending on script length. Token costs are printed at install time and recountable with ck tokens video. The ledger shows per-command costs so you know what each step spends before running it.

Does VideoKit work with Remotion for actual video production?

Yes. /video clone, /video demo, /video new, and /video render are all Remotion-integrated commands. /video clone takes a reference video URL and recreates its structural style in a Remotion composition, then verifies the match. /video render triggers a render with props and returns the output path. The Remotion product demo post covers a full end-to-end run.

How is VideoKit different from MarketingKit for content?

VideoKit is video-first: Remotion integration, retention-curve architecture, thumbnail briefs, caption file generation, and /video data benchmarking against niche leaders. MarketingKit is channel-first: voice cloning (/mkt voice), AI-tell stripping (/mkt humanize), multi-platform repurposing (/mkt repurpose turns 1 piece into 5 formats), threads, newsletters, and launch sequences. For a creator who primarily makes YouTube content, VideoKit is the right starting kit. If you also run X/LinkedIn/newsletter alongside the channel, adding MarketingKit under Pro makes sense.