Interesting model, I've managed to get the 0.6B param model running on my old 1080 and I can ge...

TheAceOfHearts • today at 6:20 PM • 1 reply • view on HN

Interesting model, I've managed to get the 0.6B param model running on my old 1080 and I can generated 200 character chunks safely without going OOM, so I thought that making an audiobook of the Tao Te Ching would be a good test. Unfortunately each snippet varies drastically in quality: sometimes the speaker is clear and coherent, but other times it bursts out laughing or moaning. In a way it feels a bit like magical roulette, never being quite certain of what you're going to get. It does have a bit of charm, when you chain the various snippets together you really don't know what direction it's gonna go.

Using speaker Ryan seems to be the most consistent, I tried speaker Eric and it sounded like someone putting on a fake exaggerated Chinese accent to mock speakers.

If it wasn't for the unpredictable level of emotions from each chunk, I'd say this is easily the highest quality TTS model I've tried.

Replies

KaoruAoiShiho • today at 6:48 PM

Have you tried specifying the emotion? There's an option to do so and if it's left empty it wouldn't surprise me if it defaulted to rng instead of bland.

➕ show 1 reply

alt Hacker News

Replies