Claude Sonnet 5

431 points • by marinesebastian • today at 5:59 PM • 221 comments • view on HN

Comments

Claude Sonnet 5 is built to be the most agentic Sonnet model yet. It can make plans, use tools like browsers and terminals, and run autonomously at a level that, just a few months ago, required larger and more expensive models.

I have been using Sonnet 4.6 more than Opus, because I'm mostly doing agent-assisted development and not fully agent-driven development. This announcement does not make me positive, I have found that the more models are optimized for fully agentic development, the worse they get at assisted development and often start doing too much despite very strict/specific instructions.

I have been moving more and more to K2.7 Code and GLM-5.2 the last few weeks. They are often good enough for assistance, very fast, and cheap.

➕ show 9 replies

Jcampuzano2 • today at 7:24 PM

I'm struggling to understand why I'd ever use this instead of just using a lower effort level for opus given on many of the benchmarks listed the cost per task rises above opus at anything higher than medium effort.

Only thing I can think of is for when someone is out of opus credits. Of course there are API billing use cases but I'd probably still just use opus on low.

➕ show 1 reply

doctoboggan • today at 6:11 PM

The cost per task chart is telling me that I should _never_ use Sonnet 5 above medium effort level - Opus always performs better for a given cost. So I guess the takeaway is that if Sonnet 5 medium isn't good enough for you, switch models, not effort levels.

➕ show 13 replies

phillipcarter • today at 6:06 PM

Seems to be another great incremental update to the workhorse, nice!

I've been using Sonnet instead of Opus for almost all coding tasks for a while now. A little elbow grease to break down tasks and you can spend a lot less money for just about the same output quality.

➕ show 1 reply

conradkay • today at 6:09 PM

Wow, seems worse even on price/performance than GLM 5.2, which is only 744b parameters.

From the system card: "On CyberGym vulnerability discovery, Claude Sonnet 5 is less capable than Sonnet 4.6, and far less capable than Opus 4.8 and Mythos 5

As with the other evaluations in this section, these results were achieved with all safeguards turned off. When run with our default mitigations, Sonnet 5 scored a 0 on CyberGym"

➕ show 3 replies

satvikpendem • today at 6:10 PM

> Evaluations also show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.

Why would they brag about something like this? It's like they know people want to use models to perform cybersecurity tasks yet knowingly deny them the ability.

And Opus 4.8 is still cheaper for a higher pass rate (much less open weight models like GLM 5.2) so not sure why I'd use Sonnet except on the low effort level for I suppose trivial tasks where I want it to work only 50% of the time judging by the graph. The pricing doesn't really make any sense.

➕ show 19 replies

Sol- • today at 6:16 PM

Wonder if the whole cyber paranoia leads to their models ultimately generating less secure code. After all, if it has the ability to generate safe code, it would imply that it knows something about cybersecurity, which could surely be used to hack all the banks in the world.

➕ show 2 replies

m3h • today at 7:14 PM

Important to note: "Sonnet 5 is an upgrade to Sonnet 4.6, but it uses an updated tokenizer that changes how the model processes text to improve performance (this is similar to the tokenizer change we introduced with Claude Opus 4.7). The tradeoff is that the same input can map to more tokens: roughly 1.0–1.35× depending on the content type. The introductory pricing is set so that the transition to Sonnet 5 is roughly cost-neutral."

➕ show 1 reply

mag7269 • today at 6:02 PM

When can we get a new Haiku? 4.5 came out nearly a year ago, and it's showing its age.

➕ show 1 reply

theLiminator • today at 6:11 PM

Seems like the way to go for any smaller models is to only use the low reasoning levels, and for anything where you'd want it to reason harder, to just use a larger model.

In effect, high reasoning only makes sense when you're using the frontier model and need extra performance (higher levels of reasoning are never pareto optimal unless you're at the largest model size).

➕ show 2 replies

johnfahey • today at 6:13 PM

Judging from those cost-performance graphs, Sonnet doesn't make sense to run at anything higher than a medium reasoning level, since Opus 4.8 low reasoning outclasses it for the price.

This line as a selling point is also pretty funny:

> Evaluations also show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.

wolttam • today at 6:07 PM

I didn't think they'd actually release a model that was worse than the open-weight frontier and at a higher price-point. Wow.

➕ show 3 replies

DonsDiscountGas • today at 6:49 PM

I'd love if they would include speed (though I know there are difficulties involved). At this point the quality of Opus 4.8 is no longer my limiting factor, it's the speed, so a faster model would be great.

➕ show 1 reply

mchusma • today at 6:09 PM

This is much more interesting of a model at $2/$10 (their launch pricing) than at full price. There are many competing models at around this level of performance.

I also like that the difference between low, medium, high, xhigh seems more spread, which is actually a good thing for people trying to tune applications. Running Sonnet 5 on low with the launch pricing makes this potentially a better fit than Haiku or open source models for some tasks. I don't think it will make sense at full price.

➕ show 1 reply

alvis • today at 6:14 PM

Ironically, the key message of today's release is that Sonnet 5 is far less capable than Opus 4.8 and Mythos 5. It's a funny development is the past few weeks

garo-pro • today at 6:12 PM

Seems like the cyber detection even is on Sonnet now. https://support.claude.com/en/articles/14604842-real-time-cy...

tokengod • today at 6:05 PM

That’s nice, but we want Fable

➕ show 2 replies

babelfish • today at 6:03 PM

System Card: https://www-cdn.anthropic.com/d9bb04416ffe1352af84721476c1fa...

theplumber • today at 7:01 PM

Is there any reason to use Sonnet instead of GLM?

➕ show 2 replies

rw2 • today at 7:04 PM

The use of the "cheaper models" in big AI companies are next to useless as they don't even score as well as the open/super cheap Chinese models. Only the frontier big models like Fable and Opus have value.

Cu3PO42 • today at 6:57 PM

Sonnet 5 is not currently available in the EU region on Bedrock, whereas previous models were and still are. I wonder if this is only due to early stages of the rollout or if this is due to recent US restrictions.

Unfortunately that means I won't be using it at work for now.

andai • today at 6:06 PM

Opus 4.8 beats Sonnet 5 on the pareto frontier in several of their graphs (Agentic Search, Agentic Computer Use).

In other words, for certain tasks, Opus 4.8 is cheaper than Sonnet 5, and does better than Sonnet 5.

I've noticed this pattern on a lot of benchmarks. You can try to emulate a bigger model by ramping up the test time compute (max reasoning, more turns, model fusion etc.), but you can't reach the same quality level, and you often exceed the cost you would have paid by just using a bigger model.

tldr: if you're doing something hard, just use a bigger model.

➕ show 1 reply

chipgap98 • today at 6:03 PM

Interesting that tasks on extra high cost almost the same as Opus 4.8 with a slightly worse performance

➕ show 2 replies

docheinestages • today at 6:21 PM

But does it burn tokens just like Opus? That's the feeling I have nowadays. Regardless of what model I choose, the 5-hour limit gets exhausted in the first hour or so.

arendtio • today at 6:59 PM

> Evaluations also show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.

It seems being incompetent is a feature now...

primaprashant • today at 6:56 PM

Based on both performance vs price charts, it seems using Opus 4.8 with med effort is almost a better choice than using Sonnet 5 at xhigh effort

alvis • today at 6:06 PM

What I starting to hate is that each model's effort level can mean completely different power.

Today sonnet 5's med level effort is equivalent to sonnet 4.6 low level effort :/

➕ show 2 replies

docproof • today at 7:04 PM

The jump in reasoning quality is noticeable. What's interesting is how it handles ambiguous instructions now — it seems to ask fewer clarifying questions and just makes a reasonable judgment call. That's a double-edged sword depending on your use case.

m3h • today at 7:00 PM

Why is Claude Sonnet 5 allowed to be released but OpenAI Terra not? Are they not the same class of models?

kingjimmy • today at 6:25 PM

interesting footnotes: "Sonnet 5 is an upgrade to Sonnet 4.6, but it uses an updated tokenizer... can map to more tokens: roughly 1.0–1.35× depending on the content type." AKA expect higher costs on Sonnet 5 vs Sonnet 4.6 for the same tasks.

➕ show 1 reply

PeterStuer • today at 7:20 PM

Anyone else feel like Opus 4.8 got significantly dumber over the last 2 weeks?

scottfits • today at 6:32 PM

> the computer use evaluation OSWorld-Verified. Sonnet 5 (orange line) is a strict improvement over Sonnet 4.6

cool to see, still waiting for models to get better at computer use.

johnhamlin • today at 6:46 PM

Kind of hilarious how much they’re touting that it sucks at cybersecurity like it’s a feature

SoKamil • today at 6:13 PM

I believe that’s gonna be meta for agentic coding this year for enterprises. Cost optimized models approaching SOTA capabilities on software engineering but without cybersec training.

jerrygoyal • today at 6:28 PM

It's actually a huge update for building products, given most tasks are sub-agent driven where Sonnet is used, steered by Opus.

beernet • today at 6:09 PM

Anthropic's run on the model and product side of things is highly impressive. They got Sam A. punching the air consistently, which is well-deserved and self-inflicted above all.

➕ show 1 reply

benjiro29 • today at 6:21 PM

Anybody notice that they did not include Sonnet 5 Max in the "Agentic Search results", when comparing to Opus 4.8 ...

Based upon the "Agentic Computer usage", Sonnet 5 Max was going to be off "Agentic Search results" chart. lol ...

In short, Sonnet 5 Low/Medium is more cost efficient, if its a task below Opus 4.8 Medium. For the rest its expensive and your better off using Opus 4.8.

Why even release this model?

➕ show 2 replies

mellosty • today at 6:58 PM

Sonnet seems to be really expensive

➕ show 1 reply

mellosty • today at 6:27 PM

It does not pass the "I want to wash my car, should I drive or walk"

➕ show 2 replies

baalimago • today at 6:49 PM

Not looking great for an upcoming IPO

➕ show 1 reply

tripleee • today at 6:26 PM

interesting how much worse the sentiment around Anthropic is getting

➕ show 1 reply

smallerfish • today at 6:20 PM

Ah that's why Opus has been so slow for the last couple of days.

_pdp_ • today at 7:05 PM

Too expensive?

tensegrist • today at 6:01 PM

there was a vibecoded prediction market–style page that was put up yesterday (?) that got the date exactly right i think

➕ show 1 reply

docheinestages • today at 6:23 PM

Is it just me or is there a huge difference between how much one can accomplish in a 5-hour window with GPT 5.5 on xhigh versus any Claude model?

➕ show 1 reply

gverrilla • today at 6:45 PM

Is this the default model for non-paying users? If so, that could be an interesting move in the competition for this segment.

Getchowned • today at 6:51 PM

Fable soon please.

ekjhgkejhgk • today at 6:31 PM

In effective terms they're lowering prices.

Scroll_Swe • today at 6:10 PM

I don't pay so I'm glad for the upgrade. I usually use Gemini, Mistral Le Chat (Vibe...) or Deepseek as they have way more generous free limits and I can basically spam forever.

alt Hacker News

Claude Sonnet 5

Comments

🔗 View 11 more comments