logoalt Hacker News

randomblock1yesterday at 6:03 PM0 repliesview on HN

No, there are quite a few models which are smaller, more accurate, and faster. For example Parakeet TDT v3 is half the size, way faster, and lower WER. There's also Voxstral, which is much larger but also even more accurate.

But the ecosystem isn't as mature, so Whisper is still a valid option, even now. For example Parakeet uses Nemotron framework (made by Nvdia), normally you need CUDA, so you need to use an ONNX version instead on AMD. Meanwhile Whisper has VLLM and desktop apps like Buzz.

There aren't many benchmarks and they often don't have all the models, since STT doesn't get nearly enough attention as normal LLMs, but this is one of the more complete ones: https://artificialanalysis.ai/speech-to-text/non-streaming