Does it turn it into spoken word or an audiobook? Because good audiobooks often have voice actors that read the characters with different emphasis and dialects. I imagine tools like chatgpt could do this for a few sentences but what about an 8-20 hour audiobook?
I think there are still basic hurdles to take before we can go epub to audiobook in a quality that can compete with current state of the art.
Or am I missing something?
There are a few character voices that also can be mixed using the mixer, achieving different nuances. You can then write your own code to use different voices for different characters.
> Because good audiobooks often have voice actors that read the characters with different emphasis and dialects.
I actually hate this. I like quotes to be read with the tone and inflection implied by the context but I don't like the different voices.
Elevenlabs has a feature for a "full cast"-type generation, where different characters will get different voices. It's certainly not automatically sensitive to dialect though.
It's probably possible with current systems to do though. I believe there are TTS systems that can use context/prompting to change emphasis and other speech qualities, though I'm not sure how reliably.