Zero shot voice clones have never been very good. Fine tuned models hit natural speaker similarity a...

echelon • 01/16/2026 • 1 reply • view on HN

Zero shot voice clones have never been very good. Fine tuned models hit natural speaker similarity and prosody in a way zero shot models can't emulate.

If it were a big model and was trained on a diverse set of speakers and could remember how to replicate them all, then zero shot is a potentially bigger deal. But this is a tiny model.

I'll try out the zero shot functionality of Pocket TTS and report back.

Replies

Barbing • 01/16/2026

Would be curious to hear!

alt Hacker News

Replies