logoalt Hacker News

steinvakt2today at 1:00 PM8 repliesview on HN

This is not a new model. Also, it hallucinates a lot. Also, it's very heavy and slow in inference. It's also bad in multilingual.

Edit: I'm talking purely about speech to text (STT). Not sure about the other things this can do.


Replies

terbotoday at 5:20 PM

It has some perks, is a bit more expressive in some cases, but overall is trained on really noisy data, uses more memory, and isn't that fast - I'm talking about the (7b?) version that they released then removed quickly (vibevoice-community on github) - I still use chatterbox turbo and sometimes qwen TTS.

lblocktoday at 1:14 PM

Yeah, I don't get why it is suddenly getting so much attention today, it is all over twitter too

show 1 reply
Tamatarrtoday at 5:32 PM

Saved a lot of my time thanks!

zuzululutoday at 4:36 PM

you saved us a lot of time here.... i unstarred the repo

moving on....

show 1 reply
scotty79today at 3:11 PM

You just saved me an afternoon.

tomberttoday at 5:00 PM

I'm shocked, shocked to find that Microsoft takes credit for a slow, unoriginal product that doesn't actually do what it advertises.

show 1 reply
gagan2020today at 3:05 PM

It is not good for text to speech (TTS) as well. I am trying it for few days. First of all 1.5B model documentation is not there. 0.5B realtime is shit model. I was converting text, line by line and it was randomly adding music and couldn't handle special characters like "…".

I really disappointed with this model to say the least.

show 2 replies
SecretDreamstoday at 1:21 PM

I think this was all covered when they said it was released by Microsoft?