> What's your basis for thinking that codex is best for planning, but opus is best for imple...

SkyPuncher • today at 3:07 AM • 0 replies • view on HN

> What's your basis for thinking that codex is best for planning, but opus is best for implementing?

I for one work on an agentic product where we use all 3 of the major frontier models. The models absolutely have preferences and "personality" that lead to different characteristics.

In my eyes:

* Gemini - consistently the best at pure reasoning and tunability. Flash models are particularly good at latency sensitive small-scale reasoning. The tradeoff is they struggle with some basic behavior, like tool calling.

* Claude - consistently good at long standing sessions. Opus may or may not be the best model, but it was the first model that crossed the "holy shit" threshold. I understand it's quirks/nuances and it's consistently solid. It's the best for me because I've learn how to be incredibly effective with it.

* ChatGPT - Probably really good, but probably not worth switching from Claude. Last time I used their frontier model, it was a bit random. It would have moments of brilliance immediately followed by falling flat on it's face.

alt Hacker News