It is super strange that all last (3?) releases they keep comparing older models such as Opus-4.6.

bratao • today at 12:14 PM • 3 replies • view on HN

Replies

Some of it’s probably timing. Some of it is wanting to look good. That said, I just went to the claw-eval site, and neither 4.7 nor 5.5 from oAI are listed on the benchmarks. So there’s also just the time from others to get benchmarking done and published.

dyauspitr • today at 2:27 PM

Because these can’t compete with the SoTA but they’re close.

varispeed • today at 1:22 PM

Opus-4.6 was probably the best model so far before it got nerfed. 4.7 is nowhere near experience I had. In fact I stopped using it completely because more often than not its output is just dumber than local models.

➕ show 1 reply

alt Hacker News

Replies