logoalt Hacker News

goodrootyesterday at 8:08 PM4 repliesview on HN

Nice one! For Linux folks, I developed https://github.com/goodroot/hyprwhspr.

On Linux, there's access to the latest Cohere Transcribe model and it works very, very well. Requires a GPU though. Larger local models generally shouldn't require a subordinate model for clean up.

Have you compared WhisperKit to faster-whisper or similar? You might be able to run turbov3 successfully and negate the need for cleanup.

Incidentally, waiting for Apple to blow this all up with native STT any day now. :)


Replies

VorpalWayyesterday at 10:12 PM

How does it compare to the more well established https://github.com/cjpais/handy? Are there any stand out features (for either option)? What was the reason for writing your own rather than using or improving existing software?

show 1 reply
LuxBennuyesterday at 8:48 PM

I've been running whisper large-v3 on an m2 max through a self-hosted endpoint and honestly the accuracy is good enough that i stopped bothering with cleanup models. The bigger annoyance for me was latency on longer chunks, like anything over 30 seconds starts feeling sluggish even with metal acceleration. Haven't tried whisperkit specifically but curious how it handles longer audio compared to the full model.

show 1 reply
pmarrecktoday at 12:08 AM

looks like there's a nearly identically named one for Hyprland

Also, wish it was on nixpkgs, where at least it will be almost guaranteed to build forever =)

hephaes7usyesterday at 8:27 PM

Thanks for sharing! I was literally getting ready to build, essentially, this. Now it looks like I don't have to!

Have you ever considered using a foot-pedal for PTT?

Apple incidentally already has native STT, but for some reason they just don't use a decent model yet.

show 2 replies