The cost per task chart is telling me that I should _never_ use Sonnet 5 above medium effort level -...

doctoboggan • today at 6:11 PM • 14 replies • view on HN

The cost per task chart is telling me that I should _never_ use Sonnet 5 above medium effort level - Opus always performs better for a given cost. So I guess the takeaway is that if Sonnet 5 medium isn't good enough for you, switch models, not effort levels.

Replies

AquinasCoder • today at 6:45 PM

While I appreciate, they publish this information, it's increasingly hard to keep track of it all. I've lost the mental model of how different models at different effort levels perform and what tasks they are good at.

In practice, I tend to just use the default on Claude Code that works well enough. But I wonder to what degree other users really play around with these settings to optimize for their project.

➕ show 2 replies

2001zhaozhao • today at 6:53 PM

There are two wrinkles to this:

- For Claude.ai subscriptions I think Sonnet is much cheaper than Opus. This is why there was a "Sonnet only" usage bar for Max tier for the longest time.

- For some tasks the sheer amount of raw input tokens is the most important. For example multimodal computer use tasks. You can't make them any more efficient on Opus by turning down the reasoning, so a cheaper model like Sonnet is useful for them

➕ show 1 reply

Torkel • today at 6:20 PM

Yeah, I was looking at the same chart and was very surprised at where the curve is relative to opus... Feels like sonnet 5 is "what if opus had an extra-low effort level"?

energy123 • today at 6:40 PM

The arguable caveat is Sonnet would run faster, so you can potentially get more done in a synchronous iterative workflow

I don't really believe this however, because so much time is spent fixing up after models that a slower but more intelligent model is a net time saver in my experience.

johnfn • today at 6:38 PM

That's just one benchmark, though. Tab to the next one and Sonnet 5 performs better as effort goes up just as you'd expect. I imagine the suggestion is that performance vs effort tradeoff is task dependent.

➕ show 1 reply

lucamark • today at 7:26 PM

You're referring to the Agentic search, but if you look at the Agentic computer use the cost is basically halved.

However, I am also confused for the market positioning. Too expensive to perform daily tasks - open souce models are much cheaper - and not frontier model to address complex real world problems.

Rarely used Sonnet btw.

booi • today at 7:26 PM

i actually exclusively use Sonnet in low effort level. It's too slow otherwise and at a higher effort levels is strictly worse than Opus.

seiru • today at 7:14 PM

Worth noting that the default chart there is for "agentic search performance", not coding. I didn't see an effort comparison for coding specifically.

manojlds • today at 7:13 PM

Opus 4.8 high doing better and cheaper than Sonnet 5 xhigh

intellijdd • today at 6:43 PM

I noticed that as well but with the introductory pricing, I wonder how true that is.

It would be great to see these charts with the promotional pricing just because it’s here for about two whole months.

I guess I could get Sonnet 5 to do it.

Natelinathan • today at 8:06 PM

I just re-wrote the /code-review skill anthropic ships to use Sonnet 4.6 for some tasks as it was using Opus for simple git diff commands and similarily mechanical tasks (launched 100+ agents for one of my diffs, cmon). I wonder how Sonnet 5 will impact my usage.

Does anyone else have any review token saving measures?

al_borland • today at 7:16 PM

What is a "task" in real-world terms? If it will be $15/million output tokens, and high/xhigh is somewhere in the $7.50/task range. Does that mean a single task is using 500k tokens. That seems like it would start to add up fast.

nicce • today at 7:27 PM

> Opus always performs better for a given cost.

Expect it to get deprecated sooner rather than later.

ZeWaka • today at 6:42 PM

It's very interesting. Why even release a new product that underperforms at the same price level? Why not just lock it?

I guess it's probably a lot cheaper for them to run, and it cuts costs for them. Seems disingenuous, though.

alt Hacker News

Replies