One reason might be that Claude Opus 4.7 thinking benchmarks better on Arena Coding at

elahieh • today at 12:51 AM • 0 replies • view on HN

One reason might be that Claude Opus 4.7 thinking benchmarks better on Arena Coding at https://arena.ai/leaderboard/text/coding ... hopefully that effectively assesses correctness. It doesn't account for reliability though.

alt Hacker News