>he fact that it can interpret the same words differently in different contexts alone shows that even on a temperature of 0 (
This is the problem with using loaded language like "reason" and "interpret". The model is not interpreting anything. All that is being done is a multdimentional map lookup with statistics.
> also don't see how that idea would fit in with the o1 models, which explicitly have "reasoning" tokens.
An LLM on top of an LLM (i.e using context to generate inputs to an LLM) is just a fancier LLM.
To really understand all of this, all you need to do is look at how Transformer works, namely the attention block. There is no such thing as Query, Key, and Value in the sense of how they are implied to be used. The may as well be called A,B,C, as they are all learned in training, and can be freely interchanged in naming. All you do for inference is multiply the output vector by A,B,C to get 3 matrices, then multiply them together (technically with a scaling factor for 2 of them, but again, doesn't matter for which 2, and the scaling factor can be built into the matrix itself)
And because you can unroll matrix multiplication into a 2 layer neural network, that means that any LLM in its current form today can be represented as a set of linear layers. And we know that a set of linear layers is simply a function. And every function has a finite range for a finite domain. And the inability to expand that range given a finite domain means its not reasoning.
So we have to rely on hacks like temperature to make it appear like reasoning, when its really not even close.
I see, I probably needed more coffee to read your initial note.
If I am repeating this back correctly, the argument is that the process itself looks nothing like human reasoning and has a number of technical limitations and even hacks that are in no way attributes or qualities of reasoning. Therefore, it clearly cannot be in any way considered reasoning. Temperature is one element of this, but there are others which you could continue to enumerate beyond even what's written above.
I can get behind part of that argument, certainly, and I appreciate you elaborating on it. I think is what I was trying to say with the part about me believing that it's not useful to think of it as reasoning. This is very different from what we might consider reasoning in very meaningful ways.
I also agree with you also that parts of this is just loaded language, as it is anthropomorphizing what is fundamentally just a bunch of matrices and non-linear functions.
I think where we differ is probably on that "when it's not even really close" part of it, at least in what I mean is "close" versus what I think you mean.
While I (think) we agree that obviously it's a different process, I do think that the input->outputs and the different qualities of input->outputs (like the so-called reasoning tokens) above can often seem quite close to the different inputs and outputs of some human reasoning. That's why I was saying that didn't see how the process works, like temperature, is relevant. Putting the processes aside, if you black box a human and a language model and put us head to head on reasoning tasks, sometimes you're going to get quite similar results.
I'm basically saying that, sure, an LLM or foundation model is clearly a Chinese room, without any understanding. What are we comparing it to, though?
Now, I don't have any kind of training in biology, but I have been led to understand that our brains are quite complex and that how their function arises from the underlying biological processes. is still fairly poorly understood. Given that, I tend to discount the degree of difference between the processes themselves and just look at the inputs and outputs. It's not obvious to me that we aren't ourselves Chinese rooms, at least to some significant degree.
So _maybe_ it's fair to try to compare what the outputs of these Transformers are to what our outputs would be. If it walks like a duck, and talks like a duck, does it matter?
Obviously, that's not fully correct -- how the output arises _has_ to matter somewhat. The fact I am sitting here writing this, and not an AI, refutes that point to some degree. And if I am understanding your thoughts correctly, I fully agree that the process really is nothing close. I just don't see how it can be a clear-cut issue on the basis of analyzing the Transformer algorithm itself.
> The model is not interpreting anything. All that is being done is a multdimentional map lookup with statistics.
So what? Can you propose another method to make a computing device understand language? The method of the creation of the output does not stipulate anything about the nature of the thing creating it. If someone could map out a human brain and tell you how thoughts are made and added a 'all that is being done is' in front of it, does that make your thought creation trivial?
> An LLM on top of an LLM (i.e using context to generate inputs to an LLM) is just a fancier LLM.
This is called a tautology. You have not given any compelling reasons why an LLM cannot do anything, so calling something another LLM is not compelling either.
> To really understand all of this, all you need to do is look at how Transformer works, namely the attention block. There is no such thing as Query, Key, and Value in the sense of how they are implied to be used. The may as well be called A,B,C, as they are all learned in training, and can be freely interchanged in naming. All you do for inference is multiply the output vector by A,B,C to get 3 matrices, then multiply them together (technically with a scaling factor for 2 of them, but again, doesn't matter for which 2, and the scaling factor can be built into the matrix itself)
Here is how it works, so therefore it must meet some criteria I have imposed arbitrarily.
> So we have to rely on hacks like temperature to make it appear like reasoning, when its really not even close.
You still haven't produced any valid argument at all, for why one thing would be evidence of the other.