I don’t know why you’re getting downvoted. It’s true. Averaged across a wide variety of benchmarks F...

sosodev • yesterday at 11:45 PM • 1 reply • view on HN

I don’t know why you’re getting downvoted. It’s true. Averaged across a wide variety of benchmarks Fable is the only Anthropic model that performs better than GPT 5.5 xhigh.

Replies

Eridrus • yesterday at 11:58 PM

The problem is that there are a bunch of benchmarks, the model providers often don't even use the same benchmarks, a bunch of them have known problems, and it's expensive to do your own benchmarks.

I am a GPT 5.x booster since to me it just feels smarter, and I generally felt like the benchmarks backed me up, but it's not every benchmark, so sadly we're mostly arguing about vibes.

SWEBench-Pro was a big one, though apparently Claude was reading solutions out of the .git folder it wasn't meant to have access to among other problems.

➕ show 1 reply

alt Hacker News

Replies