Some of the benchmarks went down, has that happened before?
Probably deprioritizing other areas to focus on swe capabilities since I reckon most of their revenue is from enterprise coding usage.
Constantly. Minor revisions can easily "wobble" on benchmarks that the training didn't explicitly push them for.
Whether it's genuine loss of capability or just measurement noise is typically unclear.
looking at the system card for opus 4.7 the MCRC benchmark used for long context tasks dropped significantly from 78% to 32%
I wonder what caused such a large regression in this benchmark
If you mean for Anthropic in particular, I don't think so. But it's not the first time a major AI lab publishes an incremental update of a model that is worse at some benchmarks. I remember that a particular update of Gemini 2.5 Pro improved results in LiveCodeBench but scored lower overall in most benchmarks.
https://news.ycombinator.com/item?id=43906555