logoalt Hacker News

brataotoday at 12:14 PM3 repliesview on HN

It is super strange that all last (3?) releases they keep comparing older models such as Opus-4.6.


Replies

vessenestoday at 12:23 PM

Some of it’s probably timing. Some of it is wanting to look good. That said, I just went to the claw-eval site, and neither 4.7 nor 5.5 from oAI are listed on the benchmarks. So there’s also just the time from others to get benchmarking done and published.

dyauspitrtoday at 2:27 PM

Because these can’t compete with the SoTA but they’re close.

varispeedtoday at 1:22 PM

Opus-4.6 was probably the best model so far before it got nerfed. 4.7 is nowhere near experience I had. In fact I stopped using it completely because more often than not its output is just dumber than local models.

show 1 reply