logoalt Hacker News

gAItoday at 5:15 PM7 repliesview on HN

4.7 was the first time I had to resort to using the previous version (4.6) for most use cases. Hoping 4.8 rectifies this.


Replies

ishurand4today at 7:25 PM

They just showed the benchmarks it improved on but it regressed on so much more, such as the MCRR benchmark: "On multi-round coreference/context recall tests (often cited as MRCR or long-text retrieval benchmarks), Opus 4.7 reportedly dropped from roughly 78.3% down to 32.2% compared to Opus 4.6."

ruairidhwmtoday at 10:03 PM

I managed to find that Haiku outperformed Sonnet on some tasks...don't want to blog spam but if anyone is interested: https://www.ruairidh.dev/blog/sonnet-4-6-drops-format-rule-o...

merlindrutoday at 5:19 PM

Same. 4.7 felt like a definite regression

show 2 replies
rhubarbtreetoday at 5:22 PM

Same. So happy when I found that option.

show 1 reply
petterroeatoday at 6:01 PM

Same. 4.7 has done some incredibly stupid things.

show 1 reply
tanepipertoday at 8:40 PM

Yep, until 1st June 4.6 is still x1 on Copilot, but will jump up quite a bit in coat - 4.7 was already highly priced, and the output was frankly terrible.

It still seems trying to build general models is mostly cost prohibitive - the frontier model provider and resellers are repricing in such a way the return on investment is dropping as developers and users become more cautious of burning their limits.

I'm still of the opinion that models like 4.6 don't need to be improved on - rather they need to be better integrated with more domain specific models in agentic flows.

dezsirazvantoday at 8:12 PM

same!