I'm suspect on how much of a coding advance it will be. Seems odd that their announcement has...

HarHarVeryFunny • yesterday at 5:57 PM • 5 replies • view on HN

I'm suspect on how much of a coding advance it will be.

Seems odd that their announcement has zero coding benchmarks, with the closest related thing being terminal bench.

Replies

Tracking model performance on Artificial Analysis makes me think these models are constantly optimized/tuned in some way or another. GPT 5.5 was scoring in the mid 60's when it was first released, now it's almost 10 points higher.

jdw64 • yesterday at 6:05 PM

Maybe I'll know once I try it? Honestly, for small functions or methods, I don't think there's a huge difference between models. But the larger the code gets, the more noticeable the difference seems to be.

Personally, I think this kind of coding experience varies from person to person

vanuatu • yesterday at 6:08 PM

sadly with all the labs benchmaxxing I feel like you just have to try the model for a while to really evaluate how good it is, especially for each individual use case

MangoCoffee • yesterday at 7:46 PM

>zero coding benchmarks

"What gets measured gets managed"

artursapek • yesterday at 6:02 PM

They claim extreme performance on ExploitBench, which Mythos was touted as being incredible at. https://x.com/OpenAI/status/2070555278576439306

➕ show 2 replies

alt Hacker News

Replies