The Percepta stuff would seem to demonstrate a mechanism for implementing "thinking". I don't understand how foundation models implement "thinking", but my intuition is that models are specifically trained for matching on and following procedural patterns. A task in a given domain can be performed through an associated and encoded procedure. The model holds all the linkages, as weights, that allows a procedure to be conditionally incrementally generated and performed. Does anyone have any insights about how LLM "thinking" is trained and coded?
Basically just madlibs - the models generate intermediate tokens that help predict a better answer based on training (RLHF & otherwise). They tend to look like "reasoning" because those tokens correlated with accepted answers during training.
Extended thinking passes are just more of the same. The entire methodology exists merely to provide additional context for the autoregression process. There is no traditional computation occurring