Yeah, I don't know if someone thought this was bad or something, but it seems like valid reasoning. We may need to give these models a sense of things that require more and less detailed reasoning and better knowledge of what they know internally, but the "research work" the model claims to be, it seems like it's doing a good job.
The poster also shared in a comment https://preview.redd.it/u8vs29hq5w2e1.png?width=2704&format=... which did get the intended laugh out of me, but even that seems fair enough. I'm currently traveling in a country where most people speak a language I don't know well. You better believe I've been thinking through even trivial greetings, considering the setting, formality, appropriate follow ups, etc.
> ou better believe I've been thinking through even trivial greetings
Even after thinking through what to say, I used the wrong greeting in a shop half an hour ago and the person working there called me on it.
OP here. I liked Macro-o1 (Marco1?) but as you pointed out, we need to teach these models to spend their system 2 thinking more economically.
Looking at that example, I too feel it's a thought process I could go through once or twice. I think this highlights an important difference between humans and LLMs: a human can think such things explicitly in their head, or on paper (I do it in text editor more frequently than I'd care to admit), once or twice, and then it sticks, quickly becomes more of a "system 1" thing. With LLMs, the closest to that outside training/fine-tuning would probably be prompt caching. It would be great if we could figure out some kind of on-line learning scheme so the model could internalize its own thoughts and persist them between conversations, but in the latent space, not prepended as token input.