4.7 was the first time I had to resort to using the previous version (4.6) for most use cases. Hopi...

gAI • today at 5:15 PM • 7 replies • view on HN

4.7 was the first time I had to resort to using the previous version (4.6) for most use cases. Hoping 4.8 rectifies this.

Replies

ishurand4 • today at 7:25 PM

They just showed the benchmarks it improved on but it regressed on so much more, such as the MCRR benchmark: "On multi-round coreference/context recall tests (often cited as MRCR or long-text retrieval benchmarks), Opus 4.7 reportedly dropped from roughly 78.3% down to 32.2% compared to Opus 4.6."

ruairidhwm • today at 10:03 PM

I managed to find that Haiku outperformed Sonnet on some tasks...don't want to blog spam but if anyone is interested: https://www.ruairidh.dev/blog/sonnet-4-6-drops-format-rule-o...

merlindru • today at 5:19 PM

Same. 4.7 felt like a definite regression

➕ show 2 replies

rhubarbtree • today at 5:22 PM

Same. So happy when I found that option.

➕ show 1 reply

petterroea • today at 6:01 PM

Same. 4.7 has done some incredibly stupid things.

➕ show 1 reply

tanepiper • today at 8:40 PM

Yep, until 1st June 4.6 is still x1 on Copilot, but will jump up quite a bit in coat - 4.7 was already highly priced, and the output was frankly terrible.

It still seems trying to build general models is mostly cost prohibitive - the frontier model provider and resellers are repricing in such a way the return on investment is dropping as developers and users become more cautious of burning their limits.

I'm still of the opinion that models like 4.6 don't need to be improved on - rather they need to be better integrated with more domain specific models in agentic flows.

dezsirazvan • today at 8:12 PM

same!

alt Hacker News

Replies