logoalt Hacker News

HarHarVeryFunny01/21/20251 replyview on HN

No, because it still has to learn what to predict when "spelling" is called for. There's no magic just because the predicted token sequence is the same as the predicting one (+/- any quotes, commas, etc).

And ...

1) If the training data isn't there, it still won't learn it

2) Having to learn that the predictive signal is a multi-token pattern (s t) vs a single token one (st) isn't making things any simpler for the model.

Clearly you've decided to go based on personal belief rather that actually testing for yourself, so the conversation is rather pointless.


Replies

danielmarkbruce01/21/2025

Go try it. I've done it.

You are going to find for 1) with character level tokenization you don't need to have data for every token for it to learn. For current tokenization schemes you do, and it still goes haywire from time to time when tokens which are close in space are spelled very differently.

Just try it, actually training one yourself.

show 1 reply