logoalt Hacker News

HarHarVeryFunny01/21/20251 replyview on HN

You seem to think that predicting s t -> s t is easier than predicting st (single token) -> s t.

Of all the incredible things that LLMs can do, why do you imagine that something so basic is challenging to them?

In a trillion token training set, how few examples of spelling are you thinking there are?

Given all the specialized data that is deliberately added to training sets to boost performance in specific areas, are you assuming that it might not occur to them to add coverage of token spellings if it was needed ?!

Why are you relying on what you believe to be true, rather than just firing up a bunch of models and trying it for yourself ?


Replies

danielmarkbruce01/21/2025

> You seem to think that predicting s t -> s t is easier than predicting st (single token) -> s t.

Yes, it is significantly easier to train a model to do the first than the second across any real vocabulary. If you don't understand why, maybe go back to basics.

show 1 reply