Yeah, GPT 5.5 + Fable beating either individually is belivable, but 2x Opus > Fable is what makes...

qsort • today at 10:34 AM • 1 reply • view on HN

Yeah, GPT 5.5 + Fable beating either individually is belivable, but 2x Opus > Fable is what makes me a bit dubious about the whole thing. They might be measuring skills that are too specific or benefit a lot from more tokens being thrown at them. Also Claude Code (the harness) is not the best at the moment, that might be part of it as well?

Replies

andai • today at 2:18 PM

What throws me off is DeepSeek beating both Opus 4.8 and GPT 5.5.

That definitely doesn't sound right.

alt Hacker News

Replies