This is not a new model. Also, it hallucinates a lot. Also, it's very heavy and slow in inferen...

steinvakt2 • today at 1:00 PM • 8 replies • view on HN

This is not a new model. Also, it hallucinates a lot. Also, it's very heavy and slow in inference. It's also bad in multilingual.

Edit: I'm talking purely about speech to text (STT). Not sure about the other things this can do.

Replies

terbo • today at 5:20 PM

It has some perks, is a bit more expressive in some cases, but overall is trained on really noisy data, uses more memory, and isn't that fast - I'm talking about the (7b?) version that they released then removed quickly (vibevoice-community on github) - I still use chatterbox turbo and sometimes qwen TTS.

lblock • today at 1:14 PM

Yeah, I don't get why it is suddenly getting so much attention today, it is all over twitter too

➕ show 1 reply

Tamatarr • today at 5:32 PM

Saved a lot of my time thanks!

zuzululu • today at 4:36 PM

you saved us a lot of time here.... i unstarred the repo

moving on....

➕ show 1 reply

scotty79 • today at 3:11 PM

You just saved me an afternoon.

tombert • today at 5:00 PM

I'm shocked, shocked to find that Microsoft takes credit for a slow, unoriginal product that doesn't actually do what it advertises.

➕ show 1 reply

gagan2020 • today at 3:05 PM

It is not good for text to speech (TTS) as well. I am trying it for few days. First of all 1.5B model documentation is not there. 0.5B realtime is shit model. I was converting text, line by line and it was randomly adding music and couldn't handle special characters like "…".

I really disappointed with this model to say the least.

➕ show 2 replies

SecretDreams • today at 1:21 PM

I think this was all covered when they said it was released by Microsoft?

alt Hacker News

Replies