logoalt Hacker News

chipgap98today at 6:03 PM2 repliesview on HN

Interesting that tasks on extra high cost almost the same as Opus 4.8 with a slightly worse performance


Replies

bredrentoday at 6:05 PM

This is on the browsercomp graph, right?

In that, it seems sonnet 5 on high costs more than opus 4.8 at a lower pass rate. Am I reading this correctly?

Edit: It looks like the key value proposition of the updated model is that it is much better than Sonnet 4.6.

Wheras, Sonnet 5 delivers great value (by browsercomp benchmarks and compared to opus) when running in low and medium.

So: Sonnet 4.6 should ~never have been run for low, medium or high when Opus 4.8 has been available. Whoops, I think I have some skills that delegate easy stuff to Sonnet.

---

I remember Anthropic pivoting everyone's default model to Opus but had not seen it put so starkly before.

I am a bit confused on the subscription `/usage` screen. It splits out sonnet usage, and I'd presumed that would have contributed to a lower use of subscription Quota.

But if this is correct, Sonnet usage was basically like smoking unfiltered cigarettes.

show 1 reply
mcbuildertoday at 6:07 PM

LRMs are plateauing for sure, not that there won't be gains to be had in the future, but it's not like the era of rapid progress that was the past year any more.

show 2 replies