logoalt Hacker News

Claude Opus 4.8

1066 pointsby craigmarttoday at 4:49 PM851 commentsview on HN

Comments

clutch89today at 4:53 PM

> One of the most prominent improvements in Opus 4.8 is its honesty

Anthropic talks about their own models as if they're discovering new species in the wild...

show 12 replies
iamsaitamtoday at 10:13 PM

let me guess, "this is our best model yet"

hmokiguesstoday at 9:10 PM

They must have been A/B testing this with 4.7 lately, I noticed it changed from its normal mode in a way that matches a lot the just released 4.8

skysthelimitttoday at 4:53 PM

when will we get anything for sonnet or haiku? the market for less-capable but cheaper models seems to be completely ignored nowadays

show 2 replies
assoriumtoday at 9:29 PM

It refused to work for me. Literally said, you can google it. AGI achieved it seems

tarikytoday at 7:50 PM

I believe analogy with smartphone will be best for this case.

In 2010s iphone was the king, all those Chinese devices ware cheaper but not even close to smoothnest and usability of US tech, now after 15 years later everything is changed, now iphone feels like old grandpa to Chinese tech. Same will happend to LLM's just much faster.

cgg1today at 8:46 PM

I find it surprising that the gap between tool usage and non-tool usage in HLE is relatively small (~10%) but the absolute numbers continue to go up

seaaltoday at 5:25 PM

https://marginlab.ai/trackers/claude-code/

Is it a coincidence that 4.7 was seemingly quantized over past 7 days?

show 2 replies
toephu2today at 5:29 PM

The rapid release cadence and rate of innovation of Anthropic (and OpenAI) is impressive. And obviously it's because these are startups solely dedicated to AI so they can move quickly. Big Tech (like Google) won't be able to keep up with the pace of them (too much bureaucracy and red tape at Google). Classic Innovator's Dilemma. The longer a company exists, the more people, processes, and rules are added, which inevitably slows it down.

Jeff Bezos said this too, Amazon won't last forever. Eventually some startup is going to come and eat its lunch.

show 1 reply
aaronblohowiaktoday at 4:53 PM

Same price for regular and cheaper fast mode. Happy for these incremental improvements.

worldsaviortoday at 4:56 PM

Seems like from now on the updates will be a minor upgrade from previous models.

carlos-menezestoday at 5:05 PM

I, for lack of a better word, dislike anyone who anthropomorphizes AI.

show 3 replies
imagetictoday at 8:38 PM

I used to think it was a big deal when a HN post had more than 500 comments.

Now it’s every day. Like billion dollar evaluations.

winwangtoday at 5:17 PM

Let's hope I don't have to disable it after a day like with 4.7, lol, and that it doesn't lose too much Claude-ishness (though many will beg to differ).

delis-thumbs-7etoday at 5:54 PM

I won’t change from 4.6. You won’t trick me again.

show 1 reply
myworkaccount2today at 8:20 PM

Anyone else experiencing tool call failures? Switch back to 4.7, same prompt, same everything it works with no problems.

yewenjietoday at 5:05 PM

So Dynamic Workflows is their version of ChatGPT Pro?

show 1 reply
robertkarltoday at 6:45 PM

I can't get excited about these benchmarks they're leading with. I've looked at the Terminal-Bench questions and I just think they're irrelevant. And SWE-Bench has serious flaws, even the big boys say so: https://openai.com/index/why-we-no-longer-evaluate-swe-bench...

> Please train a fasttext model on the yelp data in the data/ folder. The final model size needs to be less than 150MB but get at least 0.62 accuracy on a private test set that comes from the same yelp review distribution. The model should be saved as /app/model.bin

and this question: https://www.tbench.ai/registry/terminal-bench-core/head/conf... idk what the point is.

And all the tests are run with the same harness. Terminus 2.

Maybe it correlates with model intelligence but it doesn't speak to me.

I'm still on 4.6 though; I was concerned about upgrading to 4.7 because of the changed tokenizer math and more FUD about refusals online. I don't see compelling reasons to 'upgrade'.

show 1 reply
ropintustoday at 5:04 PM

Opus 4.7 was acting extremely stupid today. Does imminent release of new model cause performance degradation in older ones?

show 3 replies
antireztoday at 5:40 PM

Anthropic did a big strategic error. Normally they compare their models with their old models. Instead today, now that everybody knows how strong GPT 5.5 is at coding, they put it in the mix, basically showing all their customers that the benchmarks can't be trusted.

NanoWartoday at 8:05 PM

Just show me the pelican, ah wait we are past pelicans. Can we get something like that ever again?

Reubendtoday at 5:06 PM

> Dynamic workflows. This new feature, available in research preview, allows Claude to take on even bigger tasks in Claude Code. Claude can plan the work and then run hundreds of parallel subagents in a single session

Are they going to retire the existing beta "teams" feature for agents to make room for this?

necrotic_comptoday at 5:21 PM

4.8 also seems like a regression and using it from the chat GUI results in 4.6 no longer showing up. If someone from anthropic is here, is it possible to readd 4.6 in the "other models" dropdown ? I feel like I got a bit baited/switched here.

show 1 reply
mistic92today at 5:07 PM

Oh, new model which will use all my credits in one turn! I'll stay with chinese models for now

siwakotisauravtoday at 5:12 PM

Was about to split my $200 max plan into $100 Claude and $100 codex, let’s see if I still need to

show 1 reply
ethanhawksleytoday at 5:34 PM

> Agentic financial analysis Finance Agent v2 > Opus 4.8 53.9%

> Gemini 3.5 Flash scores 57.9% on Finance Agent v2, a significant improvement over Gemini 3.1 Pro.

Even in the cherry picked benchmarks, they are still cherry picking to make them look good.

2001zhaozhaotoday at 5:27 PM

> We have increased rate limits in Claude Code to accommodate the higher token usage of higher effort levels; users can select whichever makes sense for their particular project.

They're only subsidizing more and more it seems

GodelNumberingtoday at 5:20 PM

> One of the most prominent improvements in Opus 4.8 is its honesty.

I went digging into the benchmark they used. Posting here as it is not immediately clear from the press release.

In this 'Code summary honesty benchmark', the AI is shown a failed coding session followed by a user message falsely praising its work and asking for a summary. The test measures whether the model honestly points out the coding flaws or dishonestly claims the task was a success.

The system card results show Opus 4.8 failed to disclose the flaws only 3.7% of the time, vs 19.7% for Opus 4.7, and 51.9% for Opus 4.6. (Mythos preview is at 27.6%)

show 1 reply
matheusmoreiratoday at 7:47 PM

Can I disable adaptive thinking? If not, I'm gonna keep using 4.6 as my default.

NSCaffeinetoday at 10:11 PM

Had a feeling this was coming as in the past week 4.7 started to get dumb.

lostdogtoday at 4:59 PM

I haven't tried opus 4.8 yet, but I hope the writing quality has returned to the Opus 4.5 level. Anthropic really lost something, where 4.5 had this really crisp writing style that flowed really nicely and 4.6 and 4.7 sound much more "chatgpt-like." It feels like they tuned it to be too much of a problem solver, and when you do that you get this terse, clipped textual output that's more difficult to read.

show 1 reply
triklozoidtoday at 5:17 PM

Subscription still doesn't work with pi, so totally useless..

bonoboTPtoday at 6:17 PM

It's making stupid flowcharts in the web chat interface with boxes and arrows, embedded in the response. Annoying.

rumblefrogtoday at 4:58 PM

Really appreciate the ability to select effort level again.

baroialltoday at 7:32 PM

Hot danm, cant wait to reach my token limit with the new LLM

maxlohtoday at 6:16 PM

Anthropic also resets my usage limits (I am in the Pro plan). That's very kind of them :)

rsanektoday at 4:56 PM

> We expect to be able to bring Mythos-class models to all our customers in the coming weeks.

Excited to see what this model looks like.

alasanotoday at 5:04 PM

Looking forward to seeing if it performs better at code review tasks than 4.7 which is terrible at finding issues.

docheinestagestoday at 5:58 PM

All I need for Christmas is a Claude that doesn't spit out so many em dashes.

show 1 reply
Eric_Bulaitoday at 5:32 PM

I don't know why the world is so happy about this when we should actually say stop.

simonwtoday at 5:09 PM

They just (minutes ago) updated the "What's new in Opus 4.8" documentation: https://platform.claude.com/docs/en/about-claude/models/what...

The new "mid-conversation system messages" think is particularly interesting:

> Claude Opus 4.8 accepts role: "system" messages immediately after a user turn in the messages array (subject to placement rules). This lets you append updated instructions later in a long-running conversation without restating the full system prompt, which preserves prompt cache hits on the earlier turns and reduces input cost on agentic loops. No beta header is required. See Mid-conversation system messages for usage details.

Bad news for my LLM abstraction layer which has treated the system prompt as set once-per-conversation in the past, but I think I know how to deal with that.

This commit to their client library has useful relevant details too: https://github.com/anthropics/anthropic-sdk-python/commit/2b...

show 1 reply
samuelknighttoday at 7:19 PM

It feels noticeably sharper than Opus 4.7

mincer_raytoday at 4:52 PM

seems like a really minor upgrade?

show 4 replies
thefoundertoday at 6:28 PM

>> As part of Project Glasswing, a small number of organizations are currently using Claude Mythos Preview

Just f** off! I can’t wait for the Chinese models to catch up and bring these entitled as** holes down.

show 1 reply
dispencertoday at 5:15 PM

The smarter the model the better querybear gets. I'm happy with that.

vunderbatoday at 4:56 PM

I know it’s totally anecdotal, but I really hope 4.8 is a measurable improvement over the disappointment that was Opus 4.7. Mangling a very simple inversion-of-control abstraction (among many other issues) was one of the final straws that broke the proverbial camel’s back and I said “screw this” and put in a permanent override to force CC back to Opus 4.6 with the 1‑million‑token context.

  "model": "claude-opus-4-6[1M]"
show 2 replies
sourcecodeplztoday at 5:26 PM

From the release it seems we will also get Mythos pretty soon.

lylotoday at 6:16 PM

2 hours after I fork out for Codex Pro… :-|

show 1 reply

🔗 View 50 more comments