logoalt Hacker News

8s2ngylast Sunday at 7:08 AM3 repliesview on HN

I've been using Kokoro TTS with the CLI app, audiblez, mentioned in the "Similar Projects" section of the README. The model is fast and delivers impressive quality for its small size. Some issues I have faced, however, are: a) It doesn't distinguish periods at the end of sentences from the dots in abbreviations such as "Mr." or "Mrs." The result is an awkward pause between "Mr." and the name. b) It doesn't handle ellipses well. c) Words are pronounced the same way regardless of context.


Replies

beboplifalast Monday at 5:59 AM

I fixed that here: https://github.com/cpttripzz/audiblez The main problem with Kokoro is how flat and lifeless it sounds. But it is very fast. I prefer Chatterbox tts but it is around 20 times slower and will not work without a GPU

fudged71last Sunday at 3:22 PM

Look into SSML phoneme tags. Some TTS supports it. That was you can use a powerful LLM to fix these issues ahead of TTS

rkagererlast Sunday at 7:10 AM

The Mr. / Mrs. thing feels like it would be a pretty easy fix, at least to eliminate a lot of the more common cases.

show 1 reply