I'm still looking for the "perfect" setup in order to clone my voice and use it local...

sschueller • today at 6:19 PM • 4 replies • view on HN

I'm still looking for the "perfect" setup in order to clone my voice and use it locally to send voice replies in telegram via openclaw. Does anyone have auch a setup?

I want to be my own personal assistant...

EDIT: I can provide it a RTX 3080ti.

Replies

bdbdbdb • today at 9:26 PM

Why not just send text replies? You can already do that

ilaksh • today at 6:48 PM

You need to provide info on your hardware. Pocket-TTS does cloning on CPU, but for me randomly outputs something pretty weird sounding mixed in with like 90% good outputs. So it hasn't been quite stable enough to run without checking output. But maybe it depends on your voice sample.

Qwen 3 TTS is good for voice cloning but requires GPU of some sort.

nicpottier • today at 7:52 PM

Try training a model on piper, you will need to record a lot of utterances but the results are pretty great and the output is a fast TTS model.

justanotherunit • today at 6:22 PM

Is it not just to train a model on your voice recordings and just use that to generate audio clips from text?

alt Hacker News

Replies