logoalt Hacker News

hombre_fatallast Sunday at 9:58 AM1 replyview on HN

^ A thought that everyone has had at one point when processing human text before learning the hard way (like end of sentence detection). :P

The difference is that even weak LLMs are good at magically doing this, so I wonder what the problem is for the TTS mentioned above.


Replies

leobglast Sunday at 10:23 AM

Kokoro is small and fast because all the text -> phoneme conversion is done by “dumb code” and only the phoneme -> sound part is done using a neural net.