The burying of the lede here is insane. $5/$25 per MTok is a 3x price drop from Opus 4. At that...

llamasushi • yesterday at 7:17 PM • 11 replies • view on HN

The burying of the lede here is insane. $5/$25 per MTok is a 3x price drop from Opus 4. At that price point, Opus stops being "the model you use for important things" and becomes actually viable for production workloads.

Also notable: they're claiming SOTA prompt injection resistance. The industry has largely given up on solving this problem through training alone, so if the numbers in the system card hold up under adversarial testing, that's legitimately significant for anyone deploying agents with tool access.

The "most aligned model" framing is doing a lot of heavy lifting though. Would love to see third-party red team results.

Replies

tekacs • yesterday at 7:32 PM

This is also super relevant for everyone who had ditched Claude Code due to limits:

> For Claude and Claude Code users with access to Opus 4.5, we’ve removed Opus-specific caps. For Max and Team Premium users, we’ve increased overall usage limits, meaning you’ll have roughly the same number of Opus tokens as you previously had with Sonnet. We’re updating usage limits to make sure you’re able to use Opus 4.5 for daily work.

➕ show 5 replies

sqs • yesterday at 8:59 PM

What's super interesting is that Opus is cheaper all-in than Sonnet for many usage patterns.

Here are some early rough numbers from our own internal usage on the Amp team (avg cost $ per thread):

- Sonnet 4.5: $1.83

- Opus 4.5: $1.30 (earlier checkpoint last week was $1.55)

- Gemini 3 Pro: $1.21

Cost per token is not the right way to look at this. A bit more intelligence means mistakes (and wasted tokens) avoided.

➕ show 2 replies

sharkjacobs • yesterday at 8:16 PM

3x price drop almost certainly means Opus 4.5 is a different and smaller base model than Opus 4.1, with more fine tuning to target the benchmarks.

I'll be curious to see how performance compares to Opus 4.1 on the kind of tasks and metrics they're not explicitly targeting, e.g. eqbench.com

➕ show 3 replies

losvedir • yesterday at 7:56 PM

I almost scrolled past the "Safety" section, because in the past it always seemed sort of silly sci-fi scaremongering (IMO) or things that I would classify as "sharp tool dangerous in the wrong hands". But I'm glad I stopped, because it actually talked about real, practical issues like the prompt injections that you mention. I wonder if the industry term "safety" is pivoting to refer to other things now.

➕ show 1 reply

wolttam • yesterday at 7:22 PM

It's 1/3 the old price ($15/$75)

➕ show 2 replies

Scene_Cast2 • yesterday at 7:44 PM

Still way pricier (>2x) than Gemini 3 and Grok 4. I've noticed that the latter two also perform better than Opus 4, so I've stopped using Opus.

➕ show 1 reply

burgerone • yesterday at 9:38 PM

Using AI in production is no doubt an enormous security risk...

irthomasthomas • yesterday at 9:29 PM

It's about double the speed of 4.1, too. ~60t/s vs ~30t/s. I wish it where openweights so we could discuss the architectural changes.

cmrdporcupine • yesterday at 9:13 PM

Note the comment when you start claude code:

"To give you room to try out our new model, we've updated usage limits for Claude Code users."

That really implies non-permanence.

AtNightWeCode • yesterday at 10:06 PM

The cost of tokens in the docs is pretty much a worthless metric for these models. Only way to go is to plug it in and test it. My experience is that Claude is an expert at wasting tokens on nonsense. Easily 5x up on output tokens comparing to ChatGPT and then consider that Claude waste about 2-3x of tokens more by default.

zwnow • yesterday at 9:57 PM

Why do all these comments sound like a sales pitch? Everytime some new bullshit model is released there are hundreds of comments like this one, pointing out 2 features talking about how huge all of this is. It isn't.

alt Hacker News

Replies