Microsoft VibeVoice: Open-Source Frontier Voice AI

264 points • by tosh • today at 11:56 AM • 157 comments • view on HN

Comments

This is not a new model. Also, it hallucinates a lot. Also, it's very heavy and slow in inference. It's also bad in multilingual.

Edit: I'm talking purely about speech to text (STT). Not sure about the other things this can do.

➕ show 6 replies

vijgaurav • today at 7:12 PM

The 60-minute single-pass transcription is the part that actually matters. Most ASR models chunk audio and you lose speaker continuity across boundaries. If the diarization actually holds up on hour-long recordings without drifting, thats a real solve for podcast and meeting transcription workflows.

maxloh • today at 1:11 PM

I think we should stop calling this type of models open source. They are indeed "open weight." The training code is proprietary and never revealed.

https://github.com/microsoft/VibeVoice/issues/102

➕ show 12 replies

low_tech_punk • today at 7:09 PM

When mixing languages, why does the English have Chinese accent and Chinese have English accent? Is it a feature or bug?

isodev • today at 4:19 PM

I think in this category, Voxtral by Mistral is a lot better. It also happens to be small enough to run on webGPU https://huggingface.co/spaces/mistralai/Voxtral-Realtime-Web...

pluc • today at 1:16 PM

Interesting story about this repo/product/author by cybersecurity researcher Kevin Beaumont: https://cyberplace.social/@GossiTheDog/116454846703138243

➕ show 1 reply

embedding-shape • today at 12:52 PM

Isn't this project the one Microsoft published but then soon after pulled it for security/safety reasons? What has changed since then?

➕ show 2 replies

aqme28 • today at 1:27 PM

Interesting to see "vibe" enshrined by the likes of Microsoft as an AI product word.

➕ show 4 replies

triage8004 • today at 5:13 PM

Surprised it wasn't called Copilot Voice

CubsFan1060 • today at 12:44 PM

Great post last night from Simon: https://simonwillison.net/2026/Apr/27/vibevoice/

➕ show 2 replies

xnx • today at 2:49 PM

Still waiting for the open weights model that conclusively beats the multi-year old Whisper in accuracy, features, and performance.

➕ show 1 reply

ryukoposting • today at 1:46 PM

Holy moly, a Microsoft AI product that isn't named Copilot!

➕ show 1 reply

Anonyneko • today at 1:05 PM

You have selected Microsoft Sam as the computer's default voice.

➕ show 1 reply

isolay • today at 6:27 PM

Seriously, VibeVoice? Microslop really has a penchant for the worst names.

podgietaru • today at 12:45 PM

So we've really just settled on Vibe as the verb for AI then?

➕ show 2 replies

mberg • today at 3:34 PM

I've been using VibeVoice's ASR (speech to text) model quite intensively for the past month and have found it to be a lot more reliable and out-of-the box functional then Whisper, parakeet and other models. The fact that is has diarization built into to the model is a huge win in my book. Without that you have to run a different model just for that which adds significantly to the overall processing time vs VibeVoice which gives you reliably great results. Big fan.

chaosprint • today at 2:12 PM

Microsoft Store App Vibing.exe Accused of Harvesting Screens, Audio, and Clipboard Data:

https://cyberpress.org/microsoft-store-app-vibing-exe-accuse...

threepts • today at 5:46 PM

Explains most of the shit they have pushing with Windows 11. Perhaps all that bloatware was VibeVoiced too.

yayadarsh • today at 5:03 PM

Someone tell me if this is better or worse than Parakeet

dragonfax • today at 4:35 PM

Shouldn't it be called something like "Copilot Voice"?

frangonf • today at 1:55 PM

I took a look into local options for ASR and diarization some months ago, I missed that VibeVoice now has this feature.

My conclusions back then (which only came from a shallow research on the topic and 0 real experience mind you) was that Whisper + Pyannote was the "stable" approach.

Have the VibeVoice, Voxtral, Qwen or the Nemo solutions caught up in segmentation and speaker recognition?

Mobius01 • today at 2:21 PM

Microsoft has historically made poor choices in product naming, but this has to be a new low.

Void_ • today at 1:10 PM

I the past month or so, I added 2 models to my app Whisper Memos (https://whispermemos.com):

- Cohere Transcribe (self hosted)

- Grok Speech To Text (they provide an API, only $0.10/hr!)

They are both excellent. I'm not sure about this one. Would you like to see it in a consumer speech to text app?

➕ show 4 replies

JumpCrisscross • today at 1:26 PM

What’s the current state of the art, for each of training locally and in the cloud, for learning my voice?

➕ show 3 replies

unixhero • today at 5:33 PM

What the do they mean by frontier voice

solomatov • today at 3:35 PM

It would have been better if they provided not just weights, but also some frontend where it is usable as is.

nickandbro • today at 4:43 PM

This is a very good model, but can it be run on the web?

BlastBash192 • today at 1:28 PM

Maybe Microsoft’s real strength was never making the best model, it was knowing you don’t need to, as long as you own the platform everyone builds on.

➕ show 1 reply

mistic92 • today at 1:17 PM

For me its giving me very poor results

khimaros • today at 2:08 PM

looks like this offers ASR support in GGUF https://github.com/CrispStrobe/CrispASR -- haven't tested

simjnd • today at 6:53 PM

What a terrible name

Zopieux • today at 2:54 PM

English only?

walthamstow • today at 12:54 PM

Seems quite heavy for a STT model, Parakeet and Whisper are much smaller and perform great for quick dictation and transcription of longer files. I guess that's due to additional accuracy and speaker diarisation?

The TTS example clip in the repo of 'spontaneous singing' is creepy as fuck

ChrisArchitect • today at 2:16 PM

Previously:

Sept 2025 https://news.ycombinator.com/item?id=45114245

➕ show 1 reply

starkeeper • today at 2:20 PM

Microsoft is famous for choosing terrible names but how could they be this terrible.

villgax • today at 4:00 PM

lol they rug-pulled the 7B for our own safety some months ago

alt Hacker News

Microsoft VibeVoice: Open-Source Frontier Voice AI

Comments