Try training a model on piper, you will need to record a lot of utterances but the results are pretty great and the output is a fast TTS model.