Zum Inhalt springen

Skills / Bumblebee

Bumblebee

Turns any text into a video montage of movie clips featuring the specified text

15by @solyanviktor-star28d agoMITGitHub →

Installation

Compatibility

Claude CodeCursorVS Code

Description

Bumblebee

Surgical phrase splicing from real movies and TV shows via yarn.co.

Give it any long phrase — it greedily cuts the line into the longest possible runs of words that were actually spoken on screen, then assembles a final clip from those pieces. Classic fragmovie genre, automated to the millisecond.

100% local, no API keys, no cloud calls. Speech recognition runs on-device with faster-whisper.

Demo

"Sentient is the best company in the world."

Four variants of the same phrase, each spliced word-by-word from real movies and TV shows. Every clip pulled from yarn.co + playphrase.me and normalised to a single 854x480 / 48 kHz container so the concat is seamless.

https://github.com/user-attachments/assets/aaac7153-b528-4906-828e-984418a5f2c9

https://github.com/user-attachments/assets/cd7296c2-028d-43eb-b81a-1458bb7afb10

https://github.com/user-attachments/assets/d9fcd4b9-a751-4ed7-b60a-60c9367c7cc8

https://github.com/user-attachments/assets/23299a9b-47f7-4f91-8417-abd410d4ee8a

Install as an Agent Skill

npx skills add solyanviktor-star/Bumblebee

This installs Bumblebee into your agent's skills directory (Claude Code, Cursor, GitHub Copilot, and other compatible agents). The agent will automatically activate it on prompts like "splice a fragmovie of <phrase>" or "build a video where actors say <phrase>".

How it works

"I don't get it, why does my Claude keep getting banned. I'm sick of buying new accounts."
        |
        v   split on .!? -> each sentence handled independently
        |
[ greedy splitter — chunks of up to 6 words ]
        |
   "I don't get it why"          --+
   "does"                        --+   for each chunk:
   "Claude"                      --+     getyarn.io -> 8 candidates  (curl_cffi, bypasses CF)
   "keep getting"                --+     download mp4
   "I'm sick of"                 --+     faster-whisper word-timestamps (local, no API)
   "new accounts"                --+     word_matcher: exact match
   ...                              |    FFmpeg cut to the millisecond
                                    v
                          concat into output/final.mp4
                          (with short audio fades at every splice +
                          a ~180ms breathing pause between sentences)

Words that nobody ever said in any clip are skipped.

Mix mode: multiple takes on one phrase

python bumblebee.py "Sentient is the best company" --variants 4 -o sentient.mp4

Generates 4 files (sentient_v1.mp4, _v2, _v3, _v4) where every variant avoids clips already used by previous ones. You get different cuts with different actors, different movies, sometimes even different segmentation of the same phrase.

Install

git clone https://github.com/solyanviktor-star/Bumblebee.git
cd Bumblebee
pip install -r requirements.txt

You also need FFmpeg on PATH (or set FFMPEG_BIN to its path).

Strongly recommended: playwright + Chromium

Bumblebee uses playphrase.me as an automatic secondary source whenever yarn.co fails to cover a chunk. playphrase has 10x-1000x more clips per phrase, so installing it dramatically improves coverage on rare words and longer phrases. Without it, those words get skipped and Bumblebee asks the orchestrator to substitute a synonym.

pip install playwright
playwright install chromium    # one-time, ~120 MB

The Chromium bootstrap (~10-15s) runs lazily — only on the first yarn miss of a given run, never if yarn covers the whole phrase. If you skip this step, Bumblebee still works in yarn-only mode; you can also force-disable playphrase per run with --no-playphrase.

That's it. No API keys, no .env, nothing else to configure. The first run downloads the Whisper model (~244 MB for small.en) into the HuggingFace cache; every run after that is fully offline.

Requirements

  • Python 3.9+
  • FFmpeg
  • ~250 MB free disk space for the speech model
  • Recommended: playwright + Chromium (~120 MB) for the playphrase fallback

No GPU required. If you have a CUDA GPU, set WHISPER_DEVICE=cuda for a roughly 5x speedup on transcription.

Usage

# One video from one phrase
python bumblebee.py "I am your father"

# Several phrases — each is processed and they're concatenated in order
python bumblebee.py "I am your father" "Houston we have a problem"

# 5 different cuts of the same phrase, no clip reuse
python bumblebee.py "Sentient is the best" -o sentient.mp4 --variants 5

The final file lands in output/<name>.mp4.

Second source: playphrase.me (automatic)

yarn.co's public HTML is hard-capped at 20 unique clips per phrase. When yarn fails to cover a chunk, Bumblebee automatically falls back to playphrase.me, which often has 10x-1000x more matches (73,000 clips for "open" vs yarn's 20). Its API delivers word-timestamps natively, so playphrase clips skip the faster-whisper step entirely.

The fallback is lazy: the headless Chromium bootstrap (~10-15s, one-time per run) only runs if yarn actually misses. Phrases that yarn covers fully never touch playwright. See the Install section for the one-time playwright + Chromium setup; once installed, every run uses playphrase as needed. Pass --no-playphrase to force-disable it for a single run.

Optional environment variables

| Variable | Default | Purpose | |---|---|---| | WHISPER_MODEL | small.en | Model name. Use base.en for speed, medium.en for accuracy. | | WHISPER_DEVICE | cpu | Set to cuda if you have an NVIDIA GPU. | | WHISPER_COMPUTE_TYPE | int8 (cpu) / float16 (cuda) | Inference quantization. | | FFMPEG_BIN | ffmpeg | Path to ffmpeg binary if not on PATH. |

Project layout

Bumblebee/
|- bumblebee.py             <- CLI entry point
|- SKILL.md                 <- Claude Code skill manifest
|- src/
|  |- phrase_splitter.py    <- greedy longest-match with optional shuffling/exclusion
|  |- yarn_search.py        <- phrase -> clip_ids (curl_cffi, bypasses Cloudflare)
|  |- downloader.py         <- clip_id -> local mp4 (curl_cffi, bypasses CF on y.yarn.co)
|  |- transcriber.py        <- faster-whisper word-timestamps + cache
|  |- word_matcher.py       <- exact start/end of target words with apostrophe-fuzz
|  |- cutter.py             <- FFmpeg cut + audio fade at splice points
|  |- concat.py             <- concat demuxer
|- cache/                   <- downloaded clips and transcripts (reused across runs)
|- output/                  <- final reels and intermediate parts in _parts/

Known limitations

  • yarn.co indexes English-language media only.
  • Whisper sometimes transcribes short tokens like "I", "a", "my" as part of a longer word, so single short words tend to get skipped.
  • Word order is strict: "can we" and "we can" are different matches (a swap-fuzzy is on the TODO list).
  • yarn.co sits behind Cloudflare. Solved with curl_cffi and impersonate='chrome' (which replays a real Chrome TLS fingerprint).

License

MIT — see LICENSE.

Built end-to-end with Claude Code.

Related Skills