logoalt Hacker News

Claude Sonnet 5

431 pointsby marinesebastiantoday at 5:59 PM221 commentsview on HN

Comments

microtonaltoday at 6:12 PM

Claude Sonnet 5 is built to be the most agentic Sonnet model yet. It can make plans, use tools like browsers and terminals, and run autonomously at a level that, just a few months ago, required larger and more expensive models.

I have been using Sonnet 4.6 more than Opus, because I'm mostly doing agent-assisted development and not fully agent-driven development. This announcement does not make me positive, I have found that the more models are optimized for fully agentic development, the worse they get at assisted development and often start doing too much despite very strict/specific instructions.

I have been moving more and more to K2.7 Code and GLM-5.2 the last few weeks. They are often good enough for assistance, very fast, and cheap.

show 9 replies
Jcampuzano2today at 7:24 PM

I'm struggling to understand why I'd ever use this instead of just using a lower effort level for opus given on many of the benchmarks listed the cost per task rises above opus at anything higher than medium effort.

Only thing I can think of is for when someone is out of opus credits. Of course there are API billing use cases but I'd probably still just use opus on low.

show 1 reply
doctoboggantoday at 6:11 PM

The cost per task chart is telling me that I should _never_ use Sonnet 5 above medium effort level - Opus always performs better for a given cost. So I guess the takeaway is that if Sonnet 5 medium isn't good enough for you, switch models, not effort levels.

show 13 replies
phillipcartertoday at 6:06 PM

Seems to be another great incremental update to the workhorse, nice!

I've been using Sonnet instead of Opus for almost all coding tasks for a while now. A little elbow grease to break down tasks and you can spend a lot less money for just about the same output quality.

show 1 reply
conradkaytoday at 6:09 PM

Wow, seems worse even on price/performance than GLM 5.2, which is only 744b parameters.

From the system card: "On CyberGym vulnerability discovery, Claude Sonnet 5 is less capable than Sonnet 4.6, and far less capable than Opus 4.8 and Mythos 5

As with the other evaluations in this section, these results were achieved with all safeguards turned off. When run with our default mitigations, Sonnet 5 scored a 0 on CyberGym"

show 3 replies
satvikpendemtoday at 6:10 PM

> Evaluations also show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.

Why would they brag about something like this? It's like they know people want to use models to perform cybersecurity tasks yet knowingly deny them the ability.

And Opus 4.8 is still cheaper for a higher pass rate (much less open weight models like GLM 5.2) so not sure why I'd use Sonnet except on the low effort level for I suppose trivial tasks where I want it to work only 50% of the time judging by the graph. The pricing doesn't really make any sense.

show 19 replies
Sol-today at 6:16 PM

Wonder if the whole cyber paranoia leads to their models ultimately generating less secure code. After all, if it has the ability to generate safe code, it would imply that it knows something about cybersecurity, which could surely be used to hack all the banks in the world.

show 2 replies
m3htoday at 7:14 PM

Important to note: "Sonnet 5 is an upgrade to Sonnet 4.6, but it uses an updated tokenizer that changes how the model processes text to improve performance (this is similar to the tokenizer change we introduced with Claude Opus 4.7). The tradeoff is that the same input can map to more tokens: roughly 1.0–1.35× depending on the content type. The introductory pricing is set so that the transition to Sonnet 5 is roughly cost-neutral."

show 1 reply
mag7269today at 6:02 PM

When can we get a new Haiku? 4.5 came out nearly a year ago, and it's showing its age.

show 1 reply
theLiminatortoday at 6:11 PM

Seems like the way to go for any smaller models is to only use the low reasoning levels, and for anything where you'd want it to reason harder, to just use a larger model.

In effect, high reasoning only makes sense when you're using the frontier model and need extra performance (higher levels of reasoning are never pareto optimal unless you're at the largest model size).

show 2 replies
johnfaheytoday at 6:13 PM

Judging from those cost-performance graphs, Sonnet doesn't make sense to run at anything higher than a medium reasoning level, since Opus 4.8 low reasoning outclasses it for the price.

This line as a selling point is also pretty funny:

> Evaluations also show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.

wolttamtoday at 6:07 PM

I didn't think they'd actually release a model that was worse than the open-weight frontier and at a higher price-point. Wow.

show 3 replies
DonsDiscountGastoday at 6:49 PM

I'd love if they would include speed (though I know there are difficulties involved). At this point the quality of Opus 4.8 is no longer my limiting factor, it's the speed, so a faster model would be great.

show 1 reply
mchusmatoday at 6:09 PM

This is much more interesting of a model at $2/$10 (their launch pricing) than at full price. There are many competing models at around this level of performance.

I also like that the difference between low, medium, high, xhigh seems more spread, which is actually a good thing for people trying to tune applications. Running Sonnet 5 on low with the launch pricing makes this potentially a better fit than Haiku or open source models for some tasks. I don't think it will make sense at full price.

show 1 reply
alvistoday at 6:14 PM

Ironically, the key message of today's release is that Sonnet 5 is far less capable than Opus 4.8 and Mythos 5. It's a funny development is the past few weeks

garo-protoday at 6:12 PM

Seems like the cyber detection even is on Sonnet now. https://support.claude.com/en/articles/14604842-real-time-cy...

tokengodtoday at 6:05 PM

That’s nice, but we want Fable

show 2 replies
theplumbertoday at 7:01 PM

Is there any reason to use Sonnet instead of GLM?

show 2 replies
rw2today at 7:04 PM

The use of the "cheaper models" in big AI companies are next to useless as they don't even score as well as the open/super cheap Chinese models. Only the frontier big models like Fable and Opus have value.

Cu3PO42today at 6:57 PM

Sonnet 5 is not currently available in the EU region on Bedrock, whereas previous models were and still are. I wonder if this is only due to early stages of the rollout or if this is due to recent US restrictions.

Unfortunately that means I won't be using it at work for now.

andaitoday at 6:06 PM

Opus 4.8 beats Sonnet 5 on the pareto frontier in several of their graphs (Agentic Search, Agentic Computer Use).

In other words, for certain tasks, Opus 4.8 is cheaper than Sonnet 5, and does better than Sonnet 5.

I've noticed this pattern on a lot of benchmarks. You can try to emulate a bigger model by ramping up the test time compute (max reasoning, more turns, model fusion etc.), but you can't reach the same quality level, and you often exceed the cost you would have paid by just using a bigger model.

tldr: if you're doing something hard, just use a bigger model.

show 1 reply
chipgap98today at 6:03 PM

Interesting that tasks on extra high cost almost the same as Opus 4.8 with a slightly worse performance

show 2 replies
docheinestagestoday at 6:21 PM

But does it burn tokens just like Opus? That's the feeling I have nowadays. Regardless of what model I choose, the 5-hour limit gets exhausted in the first hour or so.

arendtiotoday at 6:59 PM

> Evaluations also show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.

It seems being incompetent is a feature now...

primaprashanttoday at 6:56 PM

Based on both performance vs price charts, it seems using Opus 4.8 with med effort is almost a better choice than using Sonnet 5 at xhigh effort

alvistoday at 6:06 PM

What I starting to hate is that each model's effort level can mean completely different power.

Today sonnet 5's med level effort is equivalent to sonnet 4.6 low level effort :/

show 2 replies
docprooftoday at 7:04 PM

The jump in reasoning quality is noticeable. What's interesting is how it handles ambiguous instructions now — it seems to ask fewer clarifying questions and just makes a reasonable judgment call. That's a double-edged sword depending on your use case.

m3htoday at 7:00 PM

Why is Claude Sonnet 5 allowed to be released but OpenAI Terra not? Are they not the same class of models?

kingjimmytoday at 6:25 PM

interesting footnotes: "Sonnet 5 is an upgrade to Sonnet 4.6, but it uses an updated tokenizer... can map to more tokens: roughly 1.0–1.35× depending on the content type." AKA expect higher costs on Sonnet 5 vs Sonnet 4.6 for the same tasks.

show 1 reply
PeterStuertoday at 7:20 PM

Anyone else feel like Opus 4.8 got significantly dumber over the last 2 weeks?

scottfitstoday at 6:32 PM

> the computer use evaluation OSWorld-Verified. Sonnet 5 (orange line) is a strict improvement over Sonnet 4.6

cool to see, still waiting for models to get better at computer use.

johnhamlintoday at 6:46 PM

Kind of hilarious how much they’re touting that it sucks at cybersecurity like it’s a feature

SoKamiltoday at 6:13 PM

I believe that’s gonna be meta for agentic coding this year for enterprises. Cost optimized models approaching SOTA capabilities on software engineering but without cybersec training.

jerrygoyaltoday at 6:28 PM

It's actually a huge update for building products, given most tasks are sub-agent driven where Sonnet is used, steered by Opus.

beernettoday at 6:09 PM

Anthropic's run on the model and product side of things is highly impressive. They got Sam A. punching the air consistently, which is well-deserved and self-inflicted above all.

show 1 reply
benjiro29today at 6:21 PM

Anybody notice that they did not include Sonnet 5 Max in the "Agentic Search results", when comparing to Opus 4.8 ...

Based upon the "Agentic Computer usage", Sonnet 5 Max was going to be off "Agentic Search results" chart. lol ...

In short, Sonnet 5 Low/Medium is more cost efficient, if its a task below Opus 4.8 Medium. For the rest its expensive and your better off using Opus 4.8.

Why even release this model?

show 2 replies
mellostytoday at 6:58 PM

Sonnet seems to be really expensive

show 1 reply
mellostytoday at 6:27 PM

It does not pass the "I want to wash my car, should I drive or walk"

show 2 replies
baalimagotoday at 6:49 PM

Not looking great for an upcoming IPO

show 1 reply
tripleeetoday at 6:26 PM

interesting how much worse the sentiment around Anthropic is getting

show 1 reply
smallerfishtoday at 6:20 PM

Ah that's why Opus has been so slow for the last couple of days.

_pdp_today at 7:05 PM

Too expensive?

tensegristtoday at 6:01 PM

there was a vibecoded prediction market–style page that was put up yesterday (?) that got the date exactly right i think

show 1 reply
docheinestagestoday at 6:23 PM

Is it just me or is there a huge difference between how much one can accomplish in a 5-hour window with GPT 5.5 on xhigh versus any Claude model?

show 1 reply
gverrillatoday at 6:45 PM

Is this the default model for non-paying users? If so, that could be an interesting move in the competition for this segment.

Getchownedtoday at 6:51 PM

Fable soon please.

ekjhgkejhgktoday at 6:31 PM

In effective terms they're lowering prices.

Scroll_Swetoday at 6:10 PM

I don't pay so I'm glad for the upgrade. I usually use Gemini, Mistral Le Chat (Vibe...) or Deepseek as they have way more generous free limits and I can basically spam forever.

🔗 View 11 more comments