Zero shot voice clones have never been very good. Fine tuned models hit natural speaker similarity and prosody in a way zero shot models can't emulate.
If it were a big model and was trained on a diverse set of speakers and could remember how to replicate them all, then zero shot is a potentially bigger deal. But this is a tiny model.
I'll try out the zero shot functionality of Pocket TTS and report back.
Zero shot voice clones have never been very good. Fine tuned models hit natural speaker similarity and prosody in a way zero shot models can't emulate.
If it were a big model and was trained on a diverse set of speakers and could remember how to replicate them all, then zero shot is a potentially bigger deal. But this is a tiny model.
I'll try out the zero shot functionality of Pocket TTS and report back.