They are repeating a million times on their huggingface page that the thinking output should be included in the conversation history for multiturn use. That makes me wonder, is this generally needed for LLMs? Because that implies that they only really function well on typicial multiturn flows; I'm experimenting with a completely different approach: there is still the main message stream in the context, but the agent can use structured means to exchange messages and interact with terminals and the file system in a statefull manner. The state is rendered to the context on every cycle, with the message history just being a "panel". I'm still in the middle of trying this out so I can't say yet if it will work. But I hope the models are flexible enough for this.
I've heard someone mention feeding back thinking when talking about gpt-oss-120, at the time that was the only evidence I could see that this is a thing.