Seeing these benchmarks makes me so happy.
Not because I love Anthropic (I do like them) but because it's staving off me having to change my Coding Agent.
This world is changing fast, and both keeping up with State of the Art and/or the feeling of FOMO is exhausting.
Ive been holding onto Claude Code for the last little while since Ive built up a robust set of habits, slash commands, and sub agents that help me squeeze as much out of the platform as possible.
But with the last few releases of Gemini and Codex I've been getting closer and closer to throwing it all out to start fresh in a new ecosystem.
Thankfully Anthropic has come out swinging today and my own SOP's can remain in tact a little while longer.
I think we are at the point where you can reliably ignore the hype and not get left behind. Until the next breakthrough at least.
I've been using Claude Code with Sonnet since August, and there haven't been any case where I thought about checking other models to see if they are any better. Things just worked. Yes, requires effort to steer correctly, but all of them do with their own quirks. Then 4.5 came, things got better automatically. Now with Opus, another step forward.
I've just ignored all the people pushing codex for the last weeks.
Don't fall into that trap and you'll be much more productive.
I personally jumped ship from Claude to OpenAI due to the rate-limiting in Claude, and have no intention of coming back unless I get convinced that the new limits are at least double of what they were when I left.
Even if the code generated by Claude is slightly better, with GPT, I can send as many requests as I want and have no fear or running into any limit, so I feel free to experiment and screw up if necessary.
Same boat and same thoughts here! Hope it holds its own against the competition, I've become a bit of a fan of Anthropic and their focus on devs.
Don't throw away what's working for you just because some other company (temporarily) leapfrogs Anthropic a few percent on a benchmark. There's a lot to be said for what you're good at.
I also really want Anthropic to succeed because they are without question the most ethical of the frontier AI labs.
I tried codex due to the same reasoning you list. The grass is not greener on the other side.. I usually only opt for codex when my claude code rate limit hits.
With Cursor or Copilot+VSCode, you get all the models, can switch any time. When a new model is announced its available same day.
I’m threw a few hours at Codex the other day and was incredibly disappointed with the outcome…
I’m a heavy Claude code user and similar workloads just didn’t work out well for me on Codex.
One of the areas I think is going to make a big difference to any model soon is speed. We can build error correcting systems into the tools - but the base models need more speed (and obviously with that lower costs)
You need much less of a robust set of habits, commands, sub agent type complexity with Codex. Not only because it lacks some of these features, it also doesn't need them as much.
The benefit you get from juggling different tools is at best marginal. In terms of actually getting work done, both Sonnet and GPT-5.1-Codex are both pretty effective. It looks like Opus will be another meaningful, but incremental, change, which I am excited about but probably won’t dramatically change how much these tools impact our work.