So Opus 4.7 is measurably worse at long-context retrieval compared to Opus 4.6. Opus 4.6 scores 91.9...

bachittle • today at 3:41 PM • 6 replies • view on HN

So Opus 4.7 is measurably worse at long-context retrieval compared to Opus 4.6. Opus 4.6 scores 91.9% and Opus 4.7 scores 59.2%. At least they're transparent about the model degradation. They traded long-context retrieval for better software engineering and math scores.

Replies

film42 • today at 4:22 PM

To be honest, I think it's just a more honest score of what Opus 4.6 actually was. Once contexts get sufficiently large, Opus develops pretty bad short term memory loss.

➕ show 1 reply

freedomben • today at 4:12 PM

Agreed, I appreciate the transparency (and Anthropic isn't normally very transparent). It's also great to know because I will change how I approach long contexts knowing it struggles more with them.

➕ show 1 reply

enraged_camel • today at 6:32 PM

No: https://x.com/bcherny/status/2044821690920980626

the13 • today at 6:26 PM

Be brief. No one wants AI boyfriend users who drone on & on about their day.

jzig • today at 4:16 PM

At what point along the 1M window does context become "long" enough that this degradation occurs?

➕ show 1 reply

teaearlgraycold • today at 5:29 PM

A year ago it felt like SoTA model developers were not improving so much as moving the dirt around. Maybe we’re in another such rut.

alt Hacker News

Replies