logoalt Hacker News

Nevermarktoday at 4:02 AM0 repliesview on HN

Exactly. Our base learning is by example, which is very much learning to predict.

Predict the right words, predict the answer, predict when the ball bounces, etc. Then reversing predictions that we have learned. I.e. choosing the action with the highest prediction of the outcome we want. Whether that is one step, or a series of predicted best steps.

Also, people confuse different levels of algorithm.

There are at least 4 levels of algorithm:

• 1 - The architecture.

This input-output calculation for pre-trained models are very well understood. We put together a model consisting of matrix/tensor operations and few other simple functions, and that is the model. Just a normal but high parameter calculation.

• 2 - The training algorithm.

These are completely understood.

There are certainly lots of questions about what is most efficient, alternatives, etc. But training algorithms harnessing gradients and similar feedback are very clearly defined.

• 3 - The type of problem a model is trained on.

Many basic problem forms are well understood. For instance, for prediction we have an ordered series of information, with later information to be predicted from earlier information. It could simply be an input and response that is learned. Or a long series of information.

• 4 - The solution learned to solve (3) the outer problem, using (2) the training algorithm on (1) the model architecture.

People keep confusing (4) with (1), (2) or (3). But it is very different.

For starters, in the general case, and for most any challenging problem, we never understand their solution. Someday it might be routine, but today we don't even know how to approach that for any significant problem.

Secondly, even with (1), (2), and (3) exactly the same, (4) is going to be wildly different based on the data characterizing the specific problem to solve. For complex problems, like language, layers and layers of sub-solutions to sub-problems have to be solved, and since models are not infinite in size, ways to repurpose sub-solutions, and weave together sub-solutions to address all the ways different sub-problems do and don't share commonalities.

Yes, prediction is the outer form of their solution. But to do that they have to learn all the relationships in the data. And there is no limit to how complex relationships in data can be. So there is no limit on the depths or complexity of the solutions found by successfully trained models.

Any argument they don't reason, based on the fact that they are being trained to predict, confuses at least (3) and (4). That is a category error.

It is true, they reason a lot more like our "fast thinking", intuitive responses, than our careful deep and reflective reasoning. And they are missing important functions, like a sense of what they know or don't. They don't continuously learn while inferencing. Or experience meta-learning, where they improve on their own reasoning abilities with reflection, like we do. And notoriously, by design, they don't "see" the letters that spell words in any normal sense. They see tokens.

Those reasoning limitations can be irritating or humorous. Like when a model seems to clearly recognize a failure you point out, but then replicates the same error over and over. No ability to learn on the spot. But they do reason.

Today, despite many successful models, nobody understands how models are able to reason like they do. There is shallow analysis. The weights are there to experiment with. But nobody can walk away from the model and training process, and build a language model directly themselves. We have no idea how to independently replicate what they have learned, despite having their solution right in front of us. Other than going through the whole process of retraining another one.