logoalt Hacker News

input_shyesterday at 6:39 PM1 replyview on HN

It's better on a benchmark I've never heard of!? That is groundbreaking, I'm switching immediately!


Replies

modelessyesterday at 6:42 PM

I also wasn't that familiar with it, but the Opus 4.6 announcement leaned pretty heavily on the TerminalBench 2.0 score to quantify how much of an improvement it was for coding, so it looks pretty bad for Anthropic that OpenAI beat them on that specific benchmark so soundly.

Looking at the Opus model card I see that they also have by far the highest score for a single model on ARC-AGI-2. I wonder why they didn't advertise that.

show 1 reply