logoalt Hacker News

jamiltonlast Sunday at 7:39 AM4 repliesview on HN

Elevenlabs has a feature for a "full cast"-type generation, where different characters will get different voices. It's certainly not automatically sensitive to dialect though.

It's probably possible with current systems to do though. I believe there are TTS systems that can use context/prompting to change emphasis and other speech qualities, though I'm not sure how reliably.


Replies

tummlerlast Sunday at 11:06 AM

I’m sure it’s doable. I think you’d want to break it into a few discrete steps for the best quality. First process the book and identify key info like genre, tone, etc. Use that to determine the best voice(s) and reading style, assign actors for multiple characters/subjects. Maybe output some examples to spot check for approval. Tweak based on that then generate the audio. Prob a couple other steps in there and maybe a bit of custom work to optimize in key areas. If someone wants to do this as a side project I can help scope the architecture and process but I don’t want to code it. :p

vorgollast Sunday at 9:45 AM

Have you heard results from it? How does it know for example, when there is a romantic scene in the book, which voice to read out as?

It's definitely an exited voice, but is it read out as in a battle or as in a romantic scene?

fudged71last Sunday at 3:20 PM

I don't think they do it automatically, though. I think you need to piece apart the transcript in their tool to decide which voice to use where.

pymanlast Sunday at 8:03 AM

Is it open source?

show 1 reply