Skills / Bumblebee
Bumblebee
Turns any text into a video montage of movie clips featuring the specified text
Installation
Compatibility
Description
Bumblebee
Surgical phrase splicing from real movies and TV shows via yarn.co.
Give it any long phrase — it greedily cuts the line into the longest possible runs of words that were actually spoken on screen, then assembles a final clip from those pieces. Classic fragmovie genre, automated to the millisecond.
100% local, no API keys, no cloud calls. Speech recognition runs on-device with faster-whisper.
Demo
"Sentient is the best company in the world."
Four variants of the same phrase, each spliced word-by-word from real movies and TV shows. Every clip pulled from yarn.co + playphrase.me and normalised to a single 854x480 / 48 kHz container so the concat is seamless.
https://github.com/user-attachments/assets/aaac7153-b528-4906-828e-984418a5f2c9
https://github.com/user-attachments/assets/cd7296c2-028d-43eb-b81a-1458bb7afb10
https://github.com/user-attachments/assets/d9fcd4b9-a751-4ed7-b60a-60c9367c7cc8
https://github.com/user-attachments/assets/23299a9b-47f7-4f91-8417-abd410d4ee8a
Install as an Agent Skill
npx skills add solyanviktor-star/Bumblebee
This installs Bumblebee into your agent's skills directory (Claude Code, Cursor, GitHub Copilot, and other compatible agents). The agent will automatically activate it on prompts like "splice a fragmovie of <phrase>" or "build a video where actors say <phrase>".
How it works
"I don't get it, why does my Claude keep getting banned. I'm sick of buying new accounts."
|
v split on .!? -> each sentence handled independently
|
[ greedy splitter — chunks of up to 6 words ]
|
"I don't get it why" --+
"does" --+ for each chunk:
"Claude" --+ getyarn.io -> 8 candidates (curl_cffi, bypasses CF)
"keep getting" --+ download mp4
"I'm sick of" --+ faster-whisper word-timestamps (local, no API)
"new accounts" --+ word_matcher: exact match
... | FFmpeg cut to the millisecond
v
concat into output/final.mp4
(with short audio fades at every splice +
a ~180ms breathing pause between sentences)
Words that nobody ever said in any clip are skipped.
Mix mode: multiple takes on one phrase
python bumblebee.py "Sentient is the best company" --variants 4 -o sentient.mp4
Generates 4 files (sentient_v1.mp4, _v2, _v3, _v4) where every variant avoids clips already used by previous ones. You get different cuts with different actors, different movies, sometimes even different segmentation of the same phrase.
Install
git clone https://github.com/solyanviktor-star/Bumblebee.git
cd Bumblebee
pip install -r requirements.txt
You also need FFmpeg on PATH (or set FFMPEG_BIN to its path).
Strongly recommended: playwright + Chromium
Bumblebee uses playphrase.me as an automatic secondary source whenever yarn.co fails to cover a chunk. playphrase has 10x-1000x more clips per phrase, so installing it dramatically improves coverage on rare words and longer phrases. Without it, those words get skipped and Bumblebee asks the orchestrator to substitute a synonym.
pip install playwright
playwright install chromium # one-time, ~120 MB
The Chromium bootstrap (~10-15s) runs lazily — only on the first yarn miss
of a given run, never if yarn covers the whole phrase. If you skip this
step, Bumblebee still works in yarn-only mode; you can also force-disable
playphrase per run with --no-playphrase.
That's it. No API keys, no .env, nothing else to configure. The first run
downloads the Whisper model (~244 MB for small.en) into the HuggingFace
cache; every run after that is fully offline.
Requirements
- Python 3.9+
- FFmpeg
- ~250 MB free disk space for the speech model
- Recommended: playwright + Chromium (~120 MB) for the playphrase fallback
No GPU required. If you have a CUDA GPU, set WHISPER_DEVICE=cuda for a roughly 5x speedup on transcription.
Usage
# One video from one phrase
python bumblebee.py "I am your father"
# Several phrases — each is processed and they're concatenated in order
python bumblebee.py "I am your father" "Houston we have a problem"
# 5 different cuts of the same phrase, no clip reuse
python bumblebee.py "Sentient is the best" -o sentient.mp4 --variants 5
The final file lands in output/<name>.mp4.
Second source: playphrase.me (automatic)
yarn.co's public HTML is hard-capped at 20 unique clips per phrase. When yarn fails to cover a chunk, Bumblebee automatically falls back to playphrase.me, which often has 10x-1000x more matches (73,000 clips for "open" vs yarn's 20). Its API delivers word-timestamps natively, so playphrase clips skip the faster-whisper step entirely.
The fallback is lazy: the headless Chromium bootstrap (~10-15s, one-time
per run) only runs if yarn actually misses. Phrases that yarn covers fully
never touch playwright. See the Install section for the one-time
playwright + Chromium setup; once installed, every run uses playphrase as
needed. Pass --no-playphrase to force-disable it for a single run.
Optional environment variables
| Variable | Default | Purpose |
|---|---|---|
| WHISPER_MODEL | small.en | Model name. Use base.en for speed, medium.en for accuracy. |
| WHISPER_DEVICE | cpu | Set to cuda if you have an NVIDIA GPU. |
| WHISPER_COMPUTE_TYPE | int8 (cpu) / float16 (cuda) | Inference quantization. |
| FFMPEG_BIN | ffmpeg | Path to ffmpeg binary if not on PATH. |
Project layout
Bumblebee/
|- bumblebee.py <- CLI entry point
|- SKILL.md <- Claude Code skill manifest
|- src/
| |- phrase_splitter.py <- greedy longest-match with optional shuffling/exclusion
| |- yarn_search.py <- phrase -> clip_ids (curl_cffi, bypasses Cloudflare)
| |- downloader.py <- clip_id -> local mp4 (curl_cffi, bypasses CF on y.yarn.co)
| |- transcriber.py <- faster-whisper word-timestamps + cache
| |- word_matcher.py <- exact start/end of target words with apostrophe-fuzz
| |- cutter.py <- FFmpeg cut + audio fade at splice points
| |- concat.py <- concat demuxer
|- cache/ <- downloaded clips and transcripts (reused across runs)
|- output/ <- final reels and intermediate parts in _parts/
Known limitations
- yarn.co indexes English-language media only.
- Whisper sometimes transcribes short tokens like "I", "a", "my" as part of a longer word, so single short words tend to get skipped.
- Word order is strict: "can we" and "we can" are different matches (a swap-fuzzy is on the TODO list).
- yarn.co sits behind Cloudflare. Solved with
curl_cffiandimpersonate='chrome'(which replays a real Chrome TLS fingerprint).
License
MIT — see LICENSE.
Built end-to-end with Claude Code.
Related Skills
last30days skill
AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
frontend slides
Create beautiful slides on the web using Claude's frontend skills
context mode
Context window optimization for AI coding agents. Sandboxes tool output, 98% reduction. 14 platforms
claude seo
Universal SEO skill for Claude Code. 19 sub-skills, 12 subagents, 3 extensions (DataForSEO, Firecrawl, Banana). Technical SEO, E-E-A-T, schema, GEO/AEO, backlinks, local SEO, maps intelligence, Google APIs, and PDF/Excel reporting.
claude ads
Comprehensive paid advertising audit & optimization skill for Claude Code. 250+ checks across Google, Meta, YouTube, LinkedIn, TikTok, Microsoft & Apple Ads with weighted scoring, parallel agents, industry templates, and AI creative generation.
claude obsidian
Claude + Obsidian knowledge companion. Persistent, compounding wiki vault based on Karpathy's LLM Wiki pattern. /wiki /save /autoresearch