You need to provide info on your hardware. Pocket-TTS does cloning on CPU, but for me randomly outputs something pretty weird sounding mixed in with like 90% good outputs. So it hasn't been quite stable enough to run without checking output. But maybe it depends on your voice sample.
Qwen 3 TTS is good for voice cloning but requires GPU of some sort.