logoalt Hacker News

anotherpaullast Sunday at 7:16 AM3 repliesview on HN

Does it turn it into spoken word or an audiobook? Because good audiobooks often have voice actors that read the characters with different emphasis and dialects. I imagine tools like chatgpt could do this for a few sentences but what about an 8-20 hour audiobook?

I think there are still basic hurdles to take before we can go epub to audiobook in a quality that can compete with current state of the art.

Or am I missing something?


Replies

jamiltonlast Sunday at 7:39 AM

Elevenlabs has a feature for a "full cast"-type generation, where different characters will get different voices. It's certainly not automatically sensitive to dialect though.

It's probably possible with current systems to do though. I believe there are TTS systems that can use context/prompting to change emphasis and other speech qualities, though I'm not sure how reliably.

show 4 replies
BenGosublast Sunday at 8:37 AM

There are a few character voices that also can be mixed using the mixer, achieving different nuances. You can then write your own code to use different voices for different characters.

parineumlast Sunday at 4:24 PM

> Because good audiobooks often have voice actors that read the characters with different emphasis and dialects.

I actually hate this. I like quotes to be read with the tone and inflection implied by the context but I don't like the different voices.

show 1 reply