Does anyone else find that there's hard to pin down reason of life-lessness in the speech of th...

binsquare • last Wednesday at 4:58 PM • 5 replies • view on HN

Does anyone else find that there's hard to pin down reason of life-lessness in the speech of these voice models?

Especially in the fruit pricing portion of the video for this model. Sounds completely normal but I can immediately tell it is ai. Maybe it's intonation or the overly stable rate of speech?

Replies

Lapel2742 • last Wednesday at 5:10 PM

IMHO it's not lifeless. It's just not overly emotional. I definitely prefer it that way. I do not want the AI to be excited. It feels so contrived.

On the video itself: Interesting, but "ideal" was pronounced wrong in German. For a promotional video, they should have checked that with native speakers. On the other hand its at least honest.

➕ show 1 reply

sosodev • last Wednesday at 5:09 PM

I think it's because they've crammed vision, audio, multiple voices, prosody control, multiple languages, etc into just 30 billion parameters.

I think ChatGPT has the most lifelike speech with their voice models. They seem to have invested heavily in that area while other labs focused elsewhere.

vessenes • last Wednesday at 7:54 PM

I'm not convinced its end-to-end multimodal - in that case, you'll have a speech synthesis section and this will be some of the result. You could test by having it sing or do some accents, or have it talk back to you in an accent you give it.

esafak • last Wednesday at 5:06 PM

> Sounds completely normal but I can immediately tell it is ai.

Maybe that's a good thing?

colechristensen • last Wednesday at 5:04 PM

I'm perfectly ok with and would prefer an AI "accent".

alt Hacker News

Replies