Seems like the industry is moving further towards having low-latency/high-speed models for direct interaction, and having slow, long thinking models for longer tasks / deeper thinking.
Quick/Instant LLMs for human use (think UI). Slow, deep thinking LLMs for autonomous agents.
Are they really thinking or are they sprinkling them with Sleep(x)?
You always want faster feedback. If not a human leveraging the fast cycles, another automated system (eg CI).
Slow, deep tasks are mostly for flashy one-shot demos that have little to no practical use in the real world.