logoalt Hacker News

qsorttoday at 10:34 AM1 replyview on HN

Yeah, GPT 5.5 + Fable beating either individually is belivable, but 2x Opus > Fable is what makes me a bit dubious about the whole thing. They might be measuring skills that are too specific or benefit a lot from more tokens being thrown at them. Also Claude Code (the harness) is not the best at the moment, that might be part of it as well?


Replies

andaitoday at 2:18 PM

What throws me off is DeepSeek beating both Opus 4.8 and GPT 5.5.

That definitely doesn't sound right.