It's better on a benchmark I've never heard of!? That is groundbreaking, I'm switchin...

input_sh • yesterday at 6:39 PM • 1 reply • view on HN

It's better on a benchmark I've never heard of!? That is groundbreaking, I'm switching immediately!

Replies

I also wasn't that familiar with it, but the Opus 4.6 announcement leaned pretty heavily on the TerminalBench 2.0 score to quantify how much of an improvement it was for coding, so it looks pretty bad for Anthropic that OpenAI beat them on that specific benchmark so soundly.

Looking at the Opus model card I see that they also have by far the highest score for a single model on ARC-AGI-2. I wonder why they didn't advertise that.

➕ show 1 reply

alt Hacker News

Replies