logoalt Hacker News

yunusabdtoday at 7:48 AM1 replyview on HN

Super nice, thanks for sharing!

There's one thing that gave me pause: In the phrase 我想学中文 it identified "wén" as "guó". While my pronunciation isn't perfect, there's no way that what I said is closer to "guó" than to "wén".

This indicates to me that the model learned word structures instead of tones here. "Zhōng guó" probably appears in the training data a lot, so the model has a bias towards recognizing that.

- Edit -

From the blog post:

> If my tone is wrong, I don’t want the model to guess what I meant. I want it to tell me what I actually said.

Your architecture also doesn't tell you what you actually said. It just maps what you said to the likeliest of the 1254 syllables that you allow. For example, it couldn't tell you that you said "wi" or "wr" instead of "wo", because those syllables don't exist in your setup.


Replies

vjerancrnjaktoday at 9:07 AM

I tried just repeating guó for as many times as symbols and repetition was not recognized.

Although I like the active aspect of the approach. Language apps where sound is the main form of learning should have a great advantage, as any written text just confuses as every country has its own spin on orthography. Even pinyin, despite making sense, for a beginner, has so many conflicting symbols.

show 1 reply