logoalt Hacker News

sulamyesterday at 6:58 PM2 repliesview on HN

Sorry, but you're mistaking outputs with process. If you actually know what models are doing under the hood to product output that (admittedly) looks very convincing, you'll quickly realize that they are simply exceptionally good at statistically predicting the next token in a stream of tokens. The reason you are having to become an expert at context engineering, and the reason the labs still hire engineers, is because turning next token prediction into something that can simulate general intelligence isn't easy.

The boundaries of these systems is very easy to find, though. Try to play any kind of game with them that isn't a prediction game, or perhaps even some that are (try to play chess with an LLM, it's amusing).


Replies

MadxX79yesterday at 7:28 PM

I enjoyed playing mastermind with LLMs where they pick the code and I have to guess it.

It's not aware that it doesn't know what the code is (it isn't in the context because it's supposed to be secret), but it just keeps giving clues. Initially it works, because most clues are possible in the beginning, but very quickly it starts to give inconsistent clues and eventually has to give up.

At no point does it "realise" that it doesn't even know what the secret code is itself. It makes it very clear that the AI isn't playing mastermind with you, it's trying to predict what a mastermind player in it's training set would say, and that doesn't include "wait a second, I'm an AI, I don't know the secret code because I didn't really pick one!" so it just merilly goes on predicting tokens, without any sort of awareness what it's saying or what it is.

It works if you allow it to output the code so it's in context, but probably just because there is enough data in the training set to match two 4 letter strings and know how many of them matches (there's not that many possibilities).

show 1 reply
10xDevyesterday at 9:46 PM

CoT already moved things past the "it is just token prediction" phase. We have models that can perform search over a very large state space across domains with good precision and refine its own search leading to a decent level of fluid intelligence, hence why ARC AGI 1/2 is essentially solved. We also don't know the exact details of what is happening at frontier labs seen as they don't publish everything anymore.

show 1 reply