This demo is really impressive:

simonw • today at 4:21 PM • 13 replies • view on HN

This demo is really impressive: https://huggingface.co/spaces/mistralai/Voxtral-Mini-Realtim...

Don't be confused if it says "no microphone", the moment you click the record button it will request browser permission and then start working.

I spoke fast and dropped in some jargon and it got it all right - I said this and it transcribed it exactly right, WebAssembly spelling included:

> Can you tell me about RSS and Atom and the role of CSP headers in browser security, especially if you're using WebAssembly?

Replies

skykooler • today at 7:24 PM

Doesn't seem to work for me - tried in both Firefox and Chromium and I can see the waveform when I talk but the transcription just shows "Awaiting audio input".

➕ show 2 replies

Oras • today at 4:25 PM

Thank you for the link! Their playground in Mistral does not have a microphone. it just uploads files, which does not demonstrate the speed and accuracy, but the link you shared does.

I tried speaking in 2 languages at once, and it picked it up correctly. Truly impressive for real-time.

➕ show 1 reply

tekacs • today at 4:41 PM

Having built with and tried every voice model over the last three years, real time and non-real time... this is off the charts compared to anything I've seen before.

And open weight too! So grateful for this.

daemonologist • today at 4:48 PM

404 on https://mistralai-voxtral-mini-realtime.hf.space/gradio_api/... for me (which shows up in the UI as a little red error in the top right).

jaggederest • today at 5:03 PM

It can transcribe Eminem's Rap God fast sequence, really, really impressive.

➕ show 2 replies

pyprism • today at 5:18 PM

Wow, that’s weird. I tried Bengali, but the text transcribed into Hindi!I know there are some similar words in these languages, but I used pure Bengali that is not similar to Hindi.

➕ show 1 reply

carbocation • today at 6:10 PM

This model was able to transcribe Bad Bunny lyrics over the sound of the background music, played casually from my speakers. Impressive, to me.

sheepscreek • today at 6:04 PM

I’ve been using AquaVoice for real-time transcription for a while now, and it has become a core part of my workflow. It gets everything, jargon, capitalization, everything. Now I’m looking forward to doing that with 100% local inference!

rafram • today at 5:35 PM

Not terrible. It missed or mixed up a lot of words when I was speaking quickly (and not enunciating very well), but it does well with normal-paced speech.

th0ma5 • today at 4:52 PM

[dead]

adarsh2321 • today at 5:20 PM

[flagged]

adarsh2321 • today at 5:26 PM

[flagged]

alt Hacker News

Replies