logoalt Hacker News

llamasushiyesterday at 7:17 PM11 repliesview on HN

The burying of the lede here is insane. $5/$25 per MTok is a 3x price drop from Opus 4. At that price point, Opus stops being "the model you use for important things" and becomes actually viable for production workloads.

Also notable: they're claiming SOTA prompt injection resistance. The industry has largely given up on solving this problem through training alone, so if the numbers in the system card hold up under adversarial testing, that's legitimately significant for anyone deploying agents with tool access.

The "most aligned model" framing is doing a lot of heavy lifting though. Would love to see third-party red team results.


Replies

tekacsyesterday at 7:32 PM

This is also super relevant for everyone who had ditched Claude Code due to limits:

> For Claude and Claude Code users with access to Opus 4.5, we’ve removed Opus-specific caps. For Max and Team Premium users, we’ve increased overall usage limits, meaning you’ll have roughly the same number of Opus tokens as you previously had with Sonnet. We’re updating usage limits to make sure you’re able to use Opus 4.5 for daily work.

show 5 replies
sqsyesterday at 8:59 PM

What's super interesting is that Opus is cheaper all-in than Sonnet for many usage patterns.

Here are some early rough numbers from our own internal usage on the Amp team (avg cost $ per thread):

- Sonnet 4.5: $1.83

- Opus 4.5: $1.30 (earlier checkpoint last week was $1.55)

- Gemini 3 Pro: $1.21

Cost per token is not the right way to look at this. A bit more intelligence means mistakes (and wasted tokens) avoided.

show 2 replies
sharkjacobsyesterday at 8:16 PM

3x price drop almost certainly means Opus 4.5 is a different and smaller base model than Opus 4.1, with more fine tuning to target the benchmarks.

I'll be curious to see how performance compares to Opus 4.1 on the kind of tasks and metrics they're not explicitly targeting, e.g. eqbench.com

show 3 replies
losvediryesterday at 7:56 PM

I almost scrolled past the "Safety" section, because in the past it always seemed sort of silly sci-fi scaremongering (IMO) or things that I would classify as "sharp tool dangerous in the wrong hands". But I'm glad I stopped, because it actually talked about real, practical issues like the prompt injections that you mention. I wonder if the industry term "safety" is pivoting to refer to other things now.

show 1 reply
wolttamyesterday at 7:22 PM

It's 1/3 the old price ($15/$75)

show 2 replies
Scene_Cast2yesterday at 7:44 PM

Still way pricier (>2x) than Gemini 3 and Grok 4. I've noticed that the latter two also perform better than Opus 4, so I've stopped using Opus.

show 1 reply
burgeroneyesterday at 9:38 PM

Using AI in production is no doubt an enormous security risk...

irthomasthomasyesterday at 9:29 PM

It's about double the speed of 4.1, too. ~60t/s vs ~30t/s. I wish it where openweights so we could discuss the architectural changes.

cmrdporcupineyesterday at 9:13 PM

Note the comment when you start claude code:

"To give you room to try out our new model, we've updated usage limits for Claude Code users."

That really implies non-permanence.

AtNightWeCodeyesterday at 10:06 PM

The cost of tokens in the docs is pretty much a worthless metric for these models. Only way to go is to plug it in and test it. My experience is that Claude is an expert at wasting tokens on nonsense. Easily 5x up on output tokens comparing to ChatGPT and then consider that Claude waste about 2-3x of tokens more by default.

zwnowyesterday at 9:57 PM

Why do all these comments sound like a sales pitch? Everytime some new bullshit model is released there are hundreds of comments like this one, pointing out 2 features talking about how huge all of this is. It isn't.