logoalt Hacker News

bazzarghyesterday at 4:32 AM1 replyview on HN

On the subject of whisper being great... A few weeks ago a co-worker commented about the difficulty he'd had editing a work demo, I pointed at various jump-cutting tools that had automated what he did in the past (editing out silences). But I'd also wanted to play with whisper for a while...

So a couple of hours later I'd written a script that does transcription based editing: on the first pass it grabs a timestamped transcript and a plain text transcript for editing; you edit the words into any order you like and a second pass reassembles the video (it's just a couple of hundred lines of python wrapping whisper and ffmpeg). It also speeds up 4x any silences detected that sit within retained sequences in the video.

Matching up transcripts turns out to be not that hard; I normalise the text, split it, and then compare to the sequence of normalised words from the timestamped transcript. I find the longest common sequence, keep that, then recurse on the before/after sections (there's a little more detail, but not much). I also sent the transcription to ffmpeg to burn in as captions, because sometimes it makes the audio choppy and the captions make it easier to follow.

I know, tools have been doing this for years now. I just didn't have one to hand, and now I do, and I couldn't have done this without whisper.


Replies

blazingbananayesterday at 12:31 PM

That is absolutely awesome and I love hearing about the tools that people build themselves!

Honestly, the capabilities of whisper is insane, the fact that it's free and open source is really a gift. Some of the things it can do feels almost sci-fi.

If you ever decide to release it publicly please let me know, sounds like a very useful tool.