Claude Opus 4.8

1066 points • by craigmart • today at 4:49 PM • 851 comments • view on HN

Comments

clutch89 • today at 4:53 PM

> One of the most prominent improvements in Opus 4.8 is its honesty

Anthropic talks about their own models as if they're discovering new species in the wild...

➕ show 12 replies

iamsaitam • today at 10:13 PM

let me guess, "this is our best model yet"

hmokiguess • today at 9:10 PM

They must have been A/B testing this with 4.7 lately, I noticed it changed from its normal mode in a way that matches a lot the just released 4.8

skysthelimitt • today at 4:53 PM

when will we get anything for sonnet or haiku? the market for less-capable but cheaper models seems to be completely ignored nowadays

➕ show 2 replies

assorium • today at 9:29 PM

It refused to work for me. Literally said, you can google it. AGI achieved it seems

tariky • today at 7:50 PM

I believe analogy with smartphone will be best for this case.

In 2010s iphone was the king, all those Chinese devices ware cheaper but not even close to smoothnest and usability of US tech, now after 15 years later everything is changed, now iphone feels like old grandpa to Chinese tech. Same will happend to LLM's just much faster.

cgg1 • today at 8:46 PM

I find it surprising that the gap between tool usage and non-tool usage in HLE is relatively small (~10%) but the absolute numbers continue to go up

seaal • today at 5:25 PM

https://marginlab.ai/trackers/claude-code/

Is it a coincidence that 4.7 was seemingly quantized over past 7 days?

➕ show 2 replies

toephu2 • today at 5:29 PM

The rapid release cadence and rate of innovation of Anthropic (and OpenAI) is impressive. And obviously it's because these are startups solely dedicated to AI so they can move quickly. Big Tech (like Google) won't be able to keep up with the pace of them (too much bureaucracy and red tape at Google). Classic Innovator's Dilemma. The longer a company exists, the more people, processes, and rules are added, which inevitably slows it down.

Jeff Bezos said this too, Amazon won't last forever. Eventually some startup is going to come and eat its lunch.

➕ show 1 reply

aaronblohowiak • today at 4:53 PM

Same price for regular and cheaper fast mode. Happy for these incremental improvements.

worldsavior • today at 4:56 PM

Seems like from now on the updates will be a minor upgrade from previous models.

carlos-menezes • today at 5:05 PM

I, for lack of a better word, dislike anyone who anthropomorphizes AI.

➕ show 3 replies

imagetic • today at 8:38 PM

I used to think it was a big deal when a HN post had more than 500 comments.

Now it’s every day. Like billion dollar evaluations.

winwang • today at 5:17 PM

Let's hope I don't have to disable it after a day like with 4.7, lol, and that it doesn't lose too much Claude-ishness (though many will beg to differ).

delis-thumbs-7e • today at 5:54 PM

I won’t change from 4.6. You won’t trick me again.

➕ show 1 reply

myworkaccount2 • today at 8:20 PM

Anyone else experiencing tool call failures? Switch back to 4.7, same prompt, same everything it works with no problems.

yewenjie • today at 5:05 PM

So Dynamic Workflows is their version of ChatGPT Pro?

➕ show 1 reply

robertkarl • today at 6:45 PM

I can't get excited about these benchmarks they're leading with. I've looked at the Terminal-Bench questions and I just think they're irrelevant. And SWE-Bench has serious flaws, even the big boys say so: https://openai.com/index/why-we-no-longer-evaluate-swe-bench...

> Please train a fasttext model on the yelp data in the data/ folder. The final model size needs to be less than 150MB but get at least 0.62 accuracy on a private test set that comes from the same yelp review distribution. The model should be saved as /app/model.bin

and this question: https://www.tbench.ai/registry/terminal-bench-core/head/conf... idk what the point is.

And all the tests are run with the same harness. Terminus 2.

Maybe it correlates with model intelligence but it doesn't speak to me.

I'm still on 4.6 though; I was concerned about upgrading to 4.7 because of the changed tokenizer math and more FUD about refusals online. I don't see compelling reasons to 'upgrade'.

➕ show 1 reply

ropintus • today at 5:04 PM

Opus 4.7 was acting extremely stupid today. Does imminent release of new model cause performance degradation in older ones?

➕ show 3 replies

antirez • today at 5:40 PM

Anthropic did a big strategic error. Normally they compare their models with their old models. Instead today, now that everybody knows how strong GPT 5.5 is at coding, they put it in the mix, basically showing all their customers that the benchmarks can't be trusted.

NanoWar • today at 8:05 PM

Just show me the pelican, ah wait we are past pelicans. Can we get something like that ever again?

Reubend • today at 5:06 PM

> Dynamic workflows. This new feature, available in research preview, allows Claude to take on even bigger tasks in Claude Code. Claude can plan the work and then run hundreds of parallel subagents in a single session

Are they going to retire the existing beta "teams" feature for agents to make room for this?

necrotic_comp • today at 5:21 PM

4.8 also seems like a regression and using it from the chat GUI results in 4.6 no longer showing up. If someone from anthropic is here, is it possible to readd 4.6 in the "other models" dropdown ? I feel like I got a bit baited/switched here.

➕ show 1 reply

mistic92 • today at 5:07 PM

Oh, new model which will use all my credits in one turn! I'll stay with chinese models for now

siwakotisaurav • today at 5:12 PM

Was about to split my $200 max plan into $100 Claude and $100 codex, let’s see if I still need to

➕ show 1 reply

ethanhawksley • today at 5:34 PM

> Agentic financial analysis Finance Agent v2 > Opus 4.8 53.9%

> Gemini 3.5 Flash scores 57.9% on Finance Agent v2, a significant improvement over Gemini 3.1 Pro.

Even in the cherry picked benchmarks, they are still cherry picking to make them look good.

2001zhaozhao • today at 5:27 PM

> We have increased rate limits in Claude Code to accommodate the higher token usage of higher effort levels; users can select whichever makes sense for their particular project.

They're only subsidizing more and more it seems

GodelNumbering • today at 5:20 PM

> One of the most prominent improvements in Opus 4.8 is its honesty.

I went digging into the benchmark they used. Posting here as it is not immediately clear from the press release.

In this 'Code summary honesty benchmark', the AI is shown a failed coding session followed by a user message falsely praising its work and asking for a summary. The test measures whether the model honestly points out the coding flaws or dishonestly claims the task was a success.

The system card results show Opus 4.8 failed to disclose the flaws only 3.7% of the time, vs 19.7% for Opus 4.7, and 51.9% for Opus 4.6. (Mythos preview is at 27.6%)

➕ show 1 reply

matheusmoreira • today at 7:47 PM

Can I disable adaptive thinking? If not, I'm gonna keep using 4.6 as my default.

NSCaffeine • today at 10:11 PM

Had a feeling this was coming as in the past week 4.7 started to get dumb.

lostdog • today at 4:59 PM

I haven't tried opus 4.8 yet, but I hope the writing quality has returned to the Opus 4.5 level. Anthropic really lost something, where 4.5 had this really crisp writing style that flowed really nicely and 4.6 and 4.7 sound much more "chatgpt-like." It feels like they tuned it to be too much of a problem solver, and when you do that you get this terse, clipped textual output that's more difficult to read.

➕ show 1 reply

triklozoid • today at 5:17 PM

Subscription still doesn't work with pi, so totally useless..

bonoboTP • today at 6:17 PM

It's making stupid flowcharts in the web chat interface with boxes and arrows, embedded in the response. Annoying.

rumblefrog • today at 4:58 PM

Really appreciate the ability to select effort level again.

baroiall • today at 7:32 PM

Hot danm, cant wait to reach my token limit with the new LLM

maxloh • today at 6:16 PM

Anthropic also resets my usage limits (I am in the Pro plan). That's very kind of them :)

rsanek • today at 4:56 PM

> We expect to be able to bring Mythos-class models to all our customers in the coming weeks.

Excited to see what this model looks like.

alasano • today at 5:04 PM

Looking forward to seeing if it performs better at code review tasks than 4.7 which is terrible at finding issues.

docheinestages • today at 5:58 PM

All I need for Christmas is a Claude that doesn't spit out so many em dashes.

➕ show 1 reply

Eric_Bulai • today at 5:32 PM

I don't know why the world is so happy about this when we should actually say stop.

simonw • today at 5:09 PM

They just (minutes ago) updated the "What's new in Opus 4.8" documentation: https://platform.claude.com/docs/en/about-claude/models/what...

The new "mid-conversation system messages" think is particularly interesting:

> Claude Opus 4.8 accepts role: "system" messages immediately after a user turn in the messages array (subject to placement rules). This lets you append updated instructions later in a long-running conversation without restating the full system prompt, which preserves prompt cache hits on the earlier turns and reduces input cost on agentic loops. No beta header is required. See Mid-conversation system messages for usage details.

Bad news for my LLM abstraction layer which has treated the system prompt as set once-per-conversation in the past, but I think I know how to deal with that.

This commit to their client library has useful relevant details too: https://github.com/anthropics/anthropic-sdk-python/commit/2b...

➕ show 1 reply

samuelknight • today at 7:19 PM

It feels noticeably sharper than Opus 4.7

mincer_ray • today at 4:52 PM

seems like a really minor upgrade?

➕ show 4 replies

thefounder • today at 6:28 PM

>> As part of Project Glasswing, a small number of organizations are currently using Claude Mythos Preview

Just f** off! I can’t wait for the Chinese models to catch up and bring these entitled as** holes down.

➕ show 1 reply

dispencer • today at 5:15 PM

The smarter the model the better querybear gets. I'm happy with that.

vunderba • today at 4:56 PM

I know it’s totally anecdotal, but I really hope 4.8 is a measurable improvement over the disappointment that was Opus 4.7. Mangling a very simple inversion-of-control abstraction (among many other issues) was one of the final straws that broke the proverbial camel’s back and I said “screw this” and put in a permanent override to force CC back to Opus 4.6 with the 1‑million‑token context.

  "model": "claude-opus-4-6[1M]"

alt Hacker News

Claude Opus 4.8

Comments

🔗 View 50 more comments