Open (Apache 2.0) TTS model for streaming conversational audio in realtime

64 points • by SweetSoftPillow • last Monday at 12:28 PM • 5 comments • view on HN

Comments

ks2048 • today at 11:44 AM

> Our work was heavily inspired by KyutaiTTS and Sesame

I wish they’d describe the technical details of the differences between this and other TTS they were “inspired by”.

So many projects like this, I will just have to assume they are vibe-coded clones to get some publicity unless there’s more technical details.

➕ show 1 reply

Neywiny • today at 5:28 PM

Not sure if it's an artifact of their streaming approach but their intro demo has exclamation marks and question marks and the intonation through the sentence just doesn't fit. It's vocalized regularly with only the last word having that exclamation or question sound. Maybe we need that Spanish upside down question mark at the start to help it.

woodson • today at 12:53 PM

Looks very similar to Kyutai’s models, given that it uses the same neural audio codec (Mimi) and Depformer module etc.

alt Hacker News

Open (Apache 2.0) TTS model for streaming conversational audio in realtime

Comments