Cohere Transcribe: Speech Recognition

114 points • by gmays • today at 4:27 PM • 41 comments • view on HN

Comments

My worry is that ASR will end up like OCR. If the multi modal large AI system is good enough (latency wise), the advantage of domain understanding eats the other technlogies alive.

In OCR, even when the characters are poorly scanned, the deep domain understanding these large multi modal AIs have allows it to understand what the document actually meant - this is going to be order id because in the million invoices I have seen before order id is normally below order date - etc. The same issue is going to be there in ASR also is my worry.

➕ show 3 replies

gruez • today at 5:32 PM

> Limitations

>Timestamps/Speaker diarization. The model does not feature either of these.

What a shame. Is whisperx still the best choice if you want timestamps/diarization?

➕ show 5 replies

kieloo • today at 7:28 PM

The problem with many STT models is that they seem to mostly be trained on perfectly-accented speech and struggle a lot with foreign accents so I’m curious to try this one as a Frenchman with a rather French English accent.

So far, the best I have found while testing models for my language learning app (Copycat Cafe) is Soniox. All others performed badly for non native accents. The worst were whisper-based models because they hallucinate when they misunderstand and tend to come up with random phrases that have nothing to do with the topic.

_medihack_ • today at 7:05 PM

Unfortunately, this model does not seem to support a custom vocabulary, word boosting or an additional prompt.

geooff_ • today at 4:43 PM

I can't say enough nice things about Cohere's services. I migrated over to their embedding model a few months ago for clip-style embeddings and it's been fantastic.

It has the most crisp, steady P50 of any external service I've used in a long time.

➕ show 1 reply

BreezyBadger • today at 8:05 PM

Awesome. Going to see if I can port https://scrivvy.ai to this. based in Canada

stavros • today at 6:39 PM

To clarify, this is SOTA in its size category, right? It's not better than Parakeet, for example?

➕ show 2 replies

teach • today at 5:29 PM

Dumb question, but if this is "open source" is there source code somewhere? Or does that term mean something different in the world of models that must be trained to be useful?

➕ show 3 replies

Void_ • today at 6:10 PM

Just today I shipped support for this in Whisper Memos: https://whispermemos.com/changelog/2026-04-cohere-transcribe

Accurate and fast model, very happy with it so far!

bkitano19 • today at 8:13 PM

notable omission of deepgram models in comparisons?

➕ show 1 reply

ramon156 • today at 6:29 PM

I had to set-up fireflies for our company recently. Cool tool, but I'm sending dozens of internal meetings to an american company. Our ISO inspector wouldn't be pleased to know.

This is a good option. Will check it out.

➕ show 1 reply

topazas • today at 5:20 PM

How hard could it be to train other European language(-s)?

➕ show 2 replies

simonw • today at 4:50 PM

It's great that this is Apache 2.0 licensed - several of Cohere's other models are licensed free for non-commercial use only.

kalmuraee • today at 6:54 PM

Multimodels are way better

➕ show 1 reply

aplomb1026 • today at 5:31 PM

[dead]

theaicloser • today at 7:52 PM

[dead]

alt Hacker News

Cohere Transcribe: Speech Recognition

Comments