logoalt Hacker News

sosodevyesterday at 6:45 PM2 repliesview on HN

Does elevenlabs have a real-time conversational voice model? It seems like like their focus is largely on text to speech and speech to text. Which can approximate that type of thing but it's not at all the same as the native voice to voice that 4o does.


Replies

hi_im_vijayyesterday at 8:02 PM

[disclaimer, i work at elevenlabs] we specifically went with a cascading model for our agents platform because it's better suited for enterprise use cases where they have full control over the brain and can bring their own llm. with that said, even with a cascading model, we can capture a decent amount of nuance with our asr model, and it also supports capturing audio events like laughter or coughing.

a true speech to speech conversational model will perform better on things like capturing tone, pronouncations, phonetics, etc, but i do believe we'll also get better at that on the asr side over time.

show 1 reply
dragonwriteryesterday at 6:58 PM

> Does elevenlabs have a real-time conversational voice model?

Yes.

> It seems like like their focus is largely on text to speech and speech to text.

They have two main broad offerings (“Platforms”); you seem to be looking at what they call the “Creative Platform”. The real-time conversational piece is the centerpiece of the “Agents Platform”.

show 2 replies