Ah yeah, longform is interesting. Not sure how you're running it, via whichever "app thi...

goodroot • yesterday at 9:06 PM • 1 reply • view on HN

Ah yeah, longform is interesting.

Not sure how you're running it, via whichever "app thing", but...

On resource limited machines: "Continuous recording" mode outputs when silence is detected via a configurable threshold.

This outputs as you speak in more reasonable chunks; in aggregate "the same output" just chunked efficiently.

Maybe you can try hackin' that up?

Replies

LuxBennu • yesterday at 9:47 PM

Yeah that makes sense, chunking on silence would sidestep the latency issue pretty cleanly. I've been running it through a basic fastapi wrapper so it just takes whatever audio blob gets thrown at it, no chunking logic on the server side. Might be worth adding a vad pass before sending to whisper though, would cut down on processing dead air too.

alt Hacker News

Replies