You seem to think that predicting s t -> s t is easier than predicting st (single token) -> s ...

HarHarVeryFunny • 01/21/2025 • 1 reply • view on HN

You seem to think that predicting s t -> s t is easier than predicting st (single token) -> s t.

Of all the incredible things that LLMs can do, why do you imagine that something so basic is challenging to them?

In a trillion token training set, how few examples of spelling are you thinking there are?

Given all the specialized data that is deliberately added to training sets to boost performance in specific areas, are you assuming that it might not occur to them to add coverage of token spellings if it was needed ?!

Why are you relying on what you believe to be true, rather than just firing up a bunch of models and trying it for yourself ?

Replies

danielmarkbruce • 01/21/2025

> You seem to think that predicting s t -> s t is easier than predicting st (single token) -> s t.

Yes, it is significantly easier to train a model to do the first than the second across any real vocabulary. If you don't understand why, maybe go back to basics.

➕ show 1 reply

alt Hacker News

Replies