why shouldn’t i assume that the “thinking” is just the usual LLM regurgitation of “how would a human coming up with a joke explain their reasoning?” or something like that, and zero “thinking”?
You shouldn't assume that because you shouldn't assume the converse either.
The LLM will use the chain of thought, as it's being built out like a garden path, to explore different completion possibilities. It's not necessarily logically related what it completes afterwards, but it's definitely influenced.
The search may not be valuable as an artifact in itself. It's likely to be logically unsound especially in parts. But the end result may be useful.
Given how it is trained specifically (they didn't encourage it to think, they allowed it to) there was a lot of emergent behavior as it trained.
Sort of like chess engines rediscovering classic (named) chess openings. See section 2.2.3 for the training template (it's a single paragraph I can't reproduce here because I'm on my phone)
Example emergent behavior (section 2.2.4 page 8): the model learns to solve more complex problems by spending more time reasoning. It also naturally develops reflection (what have I tried?) and exploration strategies.
Fundamentally, you should think of this as a nn that learned to solve real problems by reasoning about them in written language.
(My favorite part: it defaulted to reasoning in multiple languages. They constrained it to only reason in a single language and this negatively impacted performance! But the hypothesis is that it improves interpretability)