I the past month or so, I added 2 models to my app Whisper Memos (https://whispermemos.com):
- Cohere Transcribe (self hosted)
- Grok Speech To Text (they provide an API, only $0.10/hr!)
They are both excellent. I'm not sure about this one. Would you like to see it in a consumer speech to text app?
Does Cohere work with longer transcripts? Do you have to do some magic to merge recordings over 35 seconds long?
Have you tried qwen?
Any non-Musk alternatives that are comparable in quality and cost?
I've had good experiences with the Mistral Voxtral models (I've used the API, but some of the model-variants are open weight)