> So how is it therefore not actual thinking?
Many consider "thinking" something only animals can do, and they are uncomfortable with the idea that animals are biological machines or that life, consciousness, and thinking are fundamentally machine processes.
When an LLM generates chain-of-thought tokens, what we might casually call “thinking,” it fills its context window with a sequence of tokens that improves its ability to answer correctly.
This “thinking” process is not rigid deduction like in a symbolic rule system; it is more like an associative walk through a high-dimensional manifold shaped by training. The walk is partly stochastic (depending on temperature, sampling strategy, and similar factors) yet remarkably robust.
Even when you manually introduce logical errors into a chain-of-thought trace, the model’s overall accuracy usually remains better than if it had produced no reasoning tokens at all. Unlike a strict forward- or backward-chaining proof system, the LLM’s reasoning relies on statistical association rather than brittle rule-following. In a way, that fuzziness is its strength because it generalizes instead of collapsing under contradiction.
Well put, and if it doesn't notice/collapse under introduced contradictions, that's evidence it's not the kind of reasoning we were hoping for. The "real thing" is actually brittle when you do it right.