Drop raw takes in a folder. Whisper transcribes them, Claude decides the cut, ffmpeg executes it, Remotion draws the graphics. Out come branded reels in 16:9, 9:16 and 1:1 — captioned, beat-cut, loudness-normalized. No timeline was scrubbed.
{
"title": "skyloka demo",
"clips": [
{ "take": "take1.mp4",
"in": 0.5, "out": 2.5,
"note": "hook" },
{ "take": "take2.mp4",
"in": 1.0, "out": 3.0,
"transition": "crossfade" },
{ "take": "take3.mp4",
"in": 2.0, "out": 4.0,
"note": "cta" } ]
}
Each stage writes its artifact to disk. If anything fails, --resume picks up exactly where it stopped.
Every take becomes a timestamped transcript. Small model, pure CPU, cached forever.
Reads all transcripts plus your brief. Outputs a validated Edit Decision List — the strongest delivery of each idea, cut tight.
Trims, concats, crossfades, EBU R128 loudness. Beat-grid snapping lands cuts on the music.
Title cards, lower thirds, callouts, counters, logo stings — React components rendered to an alpha overlay on CPU.
Composite, burn captions, mix music with speech ducking, export every aspect ratio. 4K when you want it.
Windows-first. One PowerShell script fetches whisper.cpp and the model; ffmpeg and Node you likely have.
A brief is a markdown file describing what must land. Start from one of five.
Hook in 2s, three features shown not told, one honest number, logo sting.
Local business, owner lower-third, offer callout, phone number big.
Name it in the first second, show it in one continuous flow.
Exact words kept, filler cut, meaning never re-ordered.
Boldest frame first, one line at a time, hard cut to sting.
What LokaCut is not: a video generator. No diffusion, no synthetic footage, no invented b-roll. The footage is always your real screen recordings and real clips — that's the credibility, not the limitation.