^ A thought that everyone has had at one point when processing human text before learning the hard w...

hombre_fatal • last Sunday at 9:58 AM • 1 reply • view on HN

^ A thought that everyone has had at one point when processing human text before learning the hard way (like end of sentence detection). :P

The difference is that even weak LLMs are good at magically doing this, so I wonder what the problem is for the TTS mentioned above.

Replies

leobg • last Sunday at 10:23 AM

Kokoro is small and fast because all the text -> phoneme conversion is done by “dumb code” and only the phoneme -> sound part is done using a neural net.

alt Hacker News

Replies