Opus looks like a big jump from the previous leader (GPT 5.1), but when you switch from "50%&qu...

pugio • today at 5:00 AM • 1 reply • view on HN

Opus looks like a big jump from the previous leader (GPT 5.1), but when you switch from "50%" to "80%", GPT 5.1 still leads by a good margin. I'm not sure if you can take much from this - perhaps "5.1 is more reliable at slightly shorter stuff, choose Opus if you're trying to push the frontier in task length".

Replies

gizmodo59 • today at 7:00 AM

Yeah. 50% of the time to throw away expensive tokens and limits is not ideal. But I bet by this time next year OSS models will be at that capability!

alt Hacker News

Replies