I've been using Kokoro TTS with the CLI app, audiblez, mentioned in the "Similar Projects" section of the README. The model is fast and delivers impressive quality for its small size. Some issues I have faced, however, are: a) It doesn't distinguish periods at the end of sentences from the dots in abbreviations such as "Mr." or "Mrs." The result is an awkward pause between "Mr." and the name. b) It doesn't handle ellipses well. c) Words are pronounced the same way regardless of context.
Look into SSML phoneme tags. Some TTS supports it. That was you can use a powerful LLM to fix these issues ahead of TTS
The Mr. / Mrs. thing feels like it would be a pretty easy fix, at least to eliminate a lot of the more common cases.
I fixed that here: https://github.com/cpttripzz/audiblez The main problem with Kokoro is how flat and lifeless it sounds. But it is very fast. I prefer Chatterbox tts but it is around 20 times slower and will not work without a GPU