It's really hard for me to take these benchmarks seriously at all, especially that first one wh...

thot_experiment • yesterday at 8:04 PM • 1 reply • view on HN

It's really hard for me to take these benchmarks seriously at all, especially that first one where Sonnet 4.5 is better at software engineering than Opus 4.1.

It is emphatically not, it has never been, I have used both models extensively and I have never encountered a single situation where Sonnet did a better job than Opus. Any coding benchmark that has Sonnet above Opus is broken, or at the very least measuring things that are totally irrelevant to my usecases.

This in particular isn't my "oh the teachers lie to you moment" that makes you distrust everything they say, but it really hammers the point home. I'm glad there's a cost drop, but at this point my assumption is that there's also going to be a quality drop until I can prove otherwise in real world testing.

Replies

mirsadm • yesterday at 8:18 PM

These announcements and "upgrades" are becoming increasingly pointless. No one is going to notice this. The improvements are questionable and inconsistent. They could swap it out for an older model and no one would notice.

➕ show 1 reply

alt Hacker News

Replies