Anthropic did a big strategic error. Normally they compare their models with their old models. Inste...

antirez • today at 5:40 PM • 2 replies • view on HN

Anthropic did a big strategic error. Normally they compare their models with their old models. Instead today, now that everybody knows how strong GPT 5.5 is at coding, they put it in the mix, basically showing all their customers that the benchmarks can't be trusted.

Replies

fastball • today at 7:53 PM

Not sure I follow. Anthropic included benchmarks where GPT 5.5 outperforms Claude 4.8. Sure maybe that is a strategic error, but that doesn't seems to indicate benchmarks can't be trusted (I personally don't trust them, but not because of this).

aspenmartin • today at 5:45 PM

Sorry how does their addition of GPT 5.5 in their blog post invalidate benchmarks? Also whether or not the marketing department decided to put it in a table benchmarks are an easy thing to measure independently

alt Hacker News

Replies