5.3 codex | alt Hacker News

gizmodo59 • yesterday at 6:14 PM • 4 replies • view on HN

5.3 codex https://openai.com/index/introducing-gpt-5-3-codex/ crushes with a 77.3% in Terminal Bench. The shortest lived lead in less than 35 minutes. What a time to be alive!

Replies

wasmainiac • yesterday at 7:10 PM

Dumb question. Can these benchmarks be trusted when the model performance tends to vary depending on the hours and load on OpenAI’s servers? How do I know I’m not getting a severe penalty for chatting at the wrong time. Or even, are the models best after launch then slowly eroded away at to more economical settings after the hype wears off?

➕ show 7 replies

purplerabbit • yesterday at 6:29 PM

The lack of broad benchmark reports in this makes me curious: Has OpenAI reverted to benchmaxxing? Looking forward to hearing opinions once we all try both of these out

➕ show 1 reply

nharada • yesterday at 6:23 PM

That's a massive jump, I'm curious if there's a materially different feeling in how it works or if we're starting to reach the point of benchmark saturation. If the benchmark is good then 10 points should be a big improvement in capability...

jkelleyrtp • yesterday at 6:27 PM

claude swe-bench is 80.8 and codex is 56.8

Seems like 4.6 is still all-around better?

➕ show 2 replies