I see, I probably needed more coffee to read your initial note.
If I am repeating this back correctly, the argument is that the process itself looks nothing like human reasoning and has a number of technical limitations and even hacks that are in no way attributes or qualities of reasoning. Therefore, it clearly cannot be in any way considered reasoning. Temperature is one element of this, but there are others which you could continue to enumerate beyond even what's written above.
I can get behind part of that argument, certainly, and I appreciate you elaborating on it. I think is what I was trying to say with the part about me believing that it's not useful to think of it as reasoning. This is very different from what we might consider reasoning in very meaningful ways.
I also agree with you also that parts of this is just loaded language, as it is anthropomorphizing what is fundamentally just a bunch of matrices and non-linear functions.
I think where we differ is probably on that "when it's not even really close" part of it, at least in what I mean is "close" versus what I think you mean.
While I (think) we agree that obviously it's a different process, I do think that the input->outputs and the different qualities of input->outputs (like the so-called reasoning tokens) above can often seem quite close to the different inputs and outputs of some human reasoning. That's why I was saying that didn't see how the process works, like temperature, is relevant. Putting the processes aside, if you black box a human and a language model and put us head to head on reasoning tasks, sometimes you're going to get quite similar results.
I'm basically saying that, sure, an LLM or foundation model is clearly a Chinese room, without any understanding. What are we comparing it to, though?
Now, I don't have any kind of training in biology, but I have been led to understand that our brains are quite complex and that how their function arises from the underlying biological processes. is still fairly poorly understood. Given that, I tend to discount the degree of difference between the processes themselves and just look at the inputs and outputs. It's not obvious to me that we aren't ourselves Chinese rooms, at least to some significant degree.
So _maybe_ it's fair to try to compare what the outputs of these Transformers are to what our outputs would be. If it walks like a duck, and talks like a duck, does it matter?
Obviously, that's not fully correct -- how the output arises _has_ to matter somewhat. The fact I am sitting here writing this, and not an AI, refutes that point to some degree. And if I am understanding your thoughts correctly, I fully agree that the process really is nothing close. I just don't see how it can be a clear-cut issue on the basis of analyzing the Transformer algorithm itself.
> Putting the processes aside, if you black box a human and a language model and put us head to head on reasoning tasks, sometimes you're going to get quite similar results.
I cannot believe this is true. LLMs are awful at whatever problems are not present in the dataset used for training. They are very bad at planning problems for example, because they cannot possibly memorize every single instance, and they cannot reason to reach a solution, but a black-boxed human of course it can.
>If it walks like a duck, and talks like a duck, does it matter?
Depends on what your goals are. LLMs can get to a state where they contain a lot of human knowledge, with a lot of detail, to answer a lot of questions, and be used in many different ways. If your idea of intelligence is akin to having a bunch of experts on tap in all the different areas, then LLMS are totally fine.
I personally want something that can solve problems, not just answer questions. For example, lets say I want to build a flying car, quadcopter style, in my garage. Given the information that exists on the internet and availability of parts, this is a deterministic problem. Given that prompt, I want a set of specific instructions like "buy this part from here", "send this cad model to sendcutsend.com here and select these options", all the way down to "here is a binary file to load on the controller". And along the same lines, the AI should be able to build a full simulator application Flight Sim style, where I can load the file and play with controls to see how the thing behaves, including in less than optimal conditions.
Whatever that model does under the hood, that is called reasoning, and it certainly won't be structured like an LLM.