You didn't actually give an example of what the issue with next token prediction is. You just mentioned current constraints (ie generalization and learning are difficult, needs mountains of data to train, can't play chess very well) that are not fundamental problems. You can trivially train a transformer to play chess above the level any human can play at, and they would still be doing "next token prediction". I wouldn't be surprised if every single thing you list as a challenge is solved in a few years, either through improvement at a basic level (ie better architectures) or harnessing.
We don't know how human brains produce intelligence. At a fundamental level, they might also be doing next token prediction or something similarly "dumb". Just because we know the basic mechanism of how LLMs work doesn't mean we can explain how they work and what they do, in a similar way that we might know everything we need to know about neurons and we still cannot fully grasp sentience.
But chess models aren't trained the same way LLMs are trained. If I am not mistaken, they are trained directly from chess moves using pure reinforcement learning, and it's definitely not trivial as for instance AlphaZero took 64 TPUs to train.
I use the chess example because it’s especially instructive. It would NOT be trivial to train an LLM to play chess, next token prediction breaks down when you have so many positions to remember and you can’t adequately assign value to intermediate positions. Chess bots work by being trained on how to assign value to a position, something fundamentally different than what an LLM is doing.
A simpler example — without tool use, the standard BPE tokenization method made it impossible for state of the art LLMs to tell you how many ‘r’s are in strawberry. This is because they are thinking in tokens, not letters and not words. Can you think of anything in our intelligence where the way we encode experience makes it impossible for us to reason about it? The closest thing I can come to is how some cultures/languages have different ways of describing color and as a result cannot distinguish between colors that we think are quite distinct. And yet I can explain that, think about it, etc. We can reason abstractly and we don’t have to resort to a literal deus ex machina to do so.
Not being able to explain our brain to you doesn’t mean I can’t notice things that LLMs can’t do, and that we can, and draw some conclusions.