If you don't want to click in, easy comparison with other 2 frontier models -

twtw99 • yesterday at 6:24 PM • 7 replies • view on HN

If you don't want to click in, easy comparison with other 2 frontier models - https://x.com/OpenAI/status/2029620619743219811?s=20

Replies

bicx • yesterday at 7:47 PM

That last benchmark seemed like an impressive leg up against Opus until I saw the sneaky footnote that it was actually a Sonnet result. Why even include it then, other than hoping people don't notice?

➕ show 2 replies

chabes • yesterday at 6:26 PM

Definitely don’t want to click in at x either.

➕ show 4 replies

Aboutplants • yesterday at 6:32 PM

It seems that all frontier models are basically roughly even at this point. One may be slightly better for certain things but in general I think we are approaching a real level playing field field in terms of ability.

➕ show 4 replies

swingboy • yesterday at 6:35 PM

Why do so many people in the comments want 4o so bad?

➕ show 4 replies

MarcFrame • yesterday at 7:20 PM

how does 5.4-thinking have a lower FrontierMath score than 5.4-pro?

➕ show 2 replies

karmasimida • yesterday at 6:29 PM

It is a bigger model, confirmed

dom96 • yesterday at 7:08 PM

Why do none of the benchmarks test for hallucinations?

➕ show 2 replies

alt Hacker News

Replies