I love that everyone is making their own TTS model as they are not as expensive as many other models to train. Also there are plenty of different architecture.
Another recent example: https://github.com/supertone-inc/supertonic
Another one is Soprano-1.1.
It seems like it is being trained by one person, and it is surprisingly natural for such a small model.
I remember when TTS always meant the most robotic, barely comprehensible voices.
https://www.reddit.com/r/LocalLLaMA/comments/1qcusnt/soprano...
Thanks for heads up, this looks really interesting and claimed speed is nuts..
Thank you. Very good suggestion with code available and bindings for so many languages.
In-browser demo of Supertonic with WASM:
https://huggingface.co/spaces/Supertone/supertonic-2