As a scientist (computational physicist, so plenty of math, but also plenty of code, from Python PoCs to explicit SIMD and GPU code, mostly various subsets of C/C++), I can confirm - Codex is qualitatively better for my usecases than Claude. I keep retesting them (not on benchmarks, I simply use both in parallel for my work and see what happens) after every version update and ever since 5.2 Codex seems further and further ahead. The token limits are also far more generous (and it matters, I found it fairly easy to hit the 5h limit on max tier Claude), but mostly it's about quality - the probability that the model will give me something useful I can iterate on as opposed to discard immediately is much higher with Codex.
For the few times I've used both models side by side on more typical tasks (not so much web stuff, which I don't do much of, but more conventional Python scripts, CLI utilities in C, some OpenGL), they seem much more evenly matched. I haven't found a case where Claude would be markedly superior since Codex 5.2 came out, but I'm sure there are plenty. In my view, benchmarks are completely irrelevant at this point, just use models side by side on representative bits of your real work and stick with what works best for you. My software engineer friends often react with disbelief when I say I much prefer Codex, but in my experience it is not a close comparison.
>As a scientist (computational physicist,
Is there one that you prefer for, i dunno, physics?
I've tried both against similar and haven't found it such a clear cut difference. I still find neither are able to fully implement a complex algorithm I worked on in the past correctly with the same inputs. Not sharing exactly the benchmark I'm using but think about something for improving performance of N^2 operations that are common in physics and you can probably guess the train of thought.