logoalt Hacker News

lkeskulltoday at 7:41 AM3 repliesview on HN

this example worked in 2021, it's 2026. wake up. these models are not just "finding the most likely next word based on what they've seen on the internet".


Replies

strix_variustoday at 7:49 AM

Well, yes, definitionally they are doing exactly that.

It just turns out that there's quite a bit of knowledge and understanding baked into the relationships of words to one another.

LLMs are heavily influenced by preceding words. It's very hard for them to backtrack on an earlier branch. This is why all the reasoning models use "stop phrases" like "wait" "however" "hold on..." It's literally just text injected in order to make the auto complete more likely to revise previous bad branches.

jaccolatoday at 8:11 AM

The person above was being a bit pedantic, and zealous in their anti-anthropomorphism.

But they are literally predicting the next token. They do nothing else.

Also if you think they were just predicting the next token in 2021, there has been no fundamental architecture change since then. All gains have been via scale and efficiency optimisations (not to discount that, an awful lot of complexity in both of these)

show 1 reply
csomartoday at 8:28 AM

Unless LLMs architecture have changed, that is exactly what they are doing. You might need to learn more how LLMs work.

show 1 reply