They did compare it to other models:

Tiberium • yesterday at 6:35 PM • 3 replies • view on HN

They did compare it to other models: https://x.com/OpenAI/status/1999182104362668275

https://i.imgur.com/e0iB8KC.png

Replies

This looks cherry-picked, for example Claude Opus had a higher score on SWE-Bench Verified so they conveniently left it out, also GDPval is literally a benchmark made by OpenAI

➕ show 2 replies

sergdigon • today at 7:20 AM

The fact that the post is comparing their reasoning model against gemini 3 pro (the "non reasoning" model) and not gemini 3 pro deep think (the reasoning one) is quite nasty. If you compare GPT5.2 thinking to gemini 3 pro deep think, the scores are quite similar (sometimes one is better sometimes the other one is)

whimsicalism • yesterday at 10:27 PM

uh oh, where did SWE bench go :D

➕ show 1 reply

alt Hacker News

Replies