This is how early forms of "reasoning" in LLMs worked: just literally inserting words like "Wait...", "Hmm...", "Let me reconsider...", "But is it really..." into the token stream.
Is this not how current forms of reasoning work? It seems like the open models still output things like that, and the closed ones all just summarize their thinking instead to avoid distillation, but probably do the same thing internally.
Is this not how current forms of reasoning work? It seems like the open models still output things like that, and the closed ones all just summarize their thinking instead to avoid distillation, but probably do the same thing internally.