logoalt Hacker News

Claude Opus 4.8

1131 pointsby craigmarttoday at 4:49 PM905 commentsview on HN

Comments

mincer_raytoday at 4:52 PM

seems like a really minor upgrade?

show 4 replies
dispencertoday at 5:15 PM

The smarter the model the better querybear gets. I'm happy with that.

samuelknighttoday at 7:19 PM

It feels noticeably sharper than Opus 4.7

vunderbatoday at 4:56 PM

I know it’s totally anecdotal, but I really hope 4.8 is a measurable improvement over the disappointment that was Opus 4.7. Mangling a very simple inversion-of-control abstraction (among many other issues) was one of the final straws that broke the proverbial camel’s back and I said “screw this” and put in a permanent override to force CC back to Opus 4.6 with the 1‑million‑token context.

  "model": "claude-opus-4-6[1M]"
show 2 replies
carlos-menezestoday at 5:05 PM

I, for lack of a better word, dislike anyone who anthropomorphizes AI.

show 5 replies
sourcecodeplztoday at 5:26 PM

From the release it seems we will also get Mythos pretty soon.

plumocracytoday at 4:56 PM

Numbers looking good. We'll see how it actually performs.

show 1 reply
lylotoday at 6:16 PM

2 hours after I fork out for Codex Pro… :-|

show 1 reply
atentatentoday at 5:23 PM

At least it passes the Car Wash Test this time.

show 1 reply
s-a-ptoday at 5:23 PM

Has anyone else experienced quality degradation in CC (opus 4.7) these past few days? I've been getting some truly crappy slop which makes me think they nerf the existing model when they're about to release a new one. Of course this is based off of pure vibes

braptoday at 6:28 PM

Oof, this one is a major blabber.

rjhy2020today at 5:17 PM

OK finally Claude code is better than codex

show 1 reply
NSCaffeinetoday at 10:11 PM

Had a feeling this was coming as in the past week 4.7 started to get dumb.

1970-01-01today at 5:03 PM

Can anyone else see these X.Y updates aren't meeting the outrageous AI expectations that we were told we would see just a year ago?

show 3 replies
iLemmingtoday at 6:24 PM

These models starting to feel like Windows versions. Windows 95 was a promising start, but buggy. Windows ME was a disaster. Windows XP was good, but slightly buggy. Windows Vista was a bloated disaster. Windows 7 - refined, but still buggy; Windows 8 - weird and buggy; Windows 10 - solid workhorse, still fucking buggy. Windows 11 - pretty, but not sure why does it even exist.

Why did we even get Opus 4.7, what was the point?

hnroo99today at 5:02 PM

Obligatory pelican riding on bicycle svg: https://www.svgviewer.dev/s/UMkuTLdp

Not half bad!

show 2 replies
ionwaketoday at 9:36 PM

Im tired boss, I'm already being perfectly gaslit by the current models.

insane_dreamertoday at 8:13 PM

> And fast mode for Opus 4.8—where the model can work at 2.5× the speed—is now three times cheaper than it was for previous models.

this is what I'm happy about, if true. Opus 4.7 is frustratingly slow (and, at least in my experience, much slower than 4.5 was)

saaaaaamtoday at 5:01 PM

I hope this fixes the absolute shitshow that is 4.7 and its awful “adaptive reasoning”. I tried that a few times then reverted to 4.6.

firemelttoday at 5:55 PM

how about the bencmarks what effort did it use?

stainablesteeltoday at 9:37 PM

i'm beginning to find it comical how every model release always presents itself as superior to every other model on the market, but they always leave just one test where some other model was modestly better, just in case.

AtNightWeCodetoday at 7:27 PM

Complete garbage. error, error, error. Still lags several versions behind on API:s. Can't even get any info on the model. Guessing not from this year.

Also. Look at this C++ beauty where it also uses an obsolete api.

instance = wgpuCreateInstance(&instanceDesc);

But just how exactly would this work in any context when instance is never declared.

sgttoday at 6:33 PM

Interesting, I've been using 4.7 since it came out and it was pretty good for me. But in the last day or so it turned dumb. Is this normal just before they release a new one?

AbuAssartoday at 7:08 PM

Gemini pro is embarrassing

HlessClaudesmantoday at 4:53 PM

If this model is more honest, it must be honestly praising my efforts every first sentence.

show 1 reply
lukaslalinskytoday at 6:25 PM

I've said it before, but I don't like Opus past version 4.5. It became unresponsive, thinking for too long without feedback, sometimes seemingly getting stuck. I guess it might be marginally better for some benchmarks, but when using it as coding assistant, the new models are worse. Even the new Sonnet versions do that. I'm slowly getting used to Haiku-level LLMs with the hope to run it locally at some point. It's less autonomous, but maybe that's for the best.

catigulatoday at 5:39 PM

AGI post-poned?

zb3today at 5:00 PM

Did they reduce security research capabilities even further with this release? (they did it for opus 4.7)

guluartetoday at 4:58 PM

so it is worse than gpt 5.5 for coding?

show 2 replies
behnamohtoday at 4:56 PM

> As always, we ran a detailed alignment assessment on the model before release. In terms of positive traits, our Alignment team concluded that Opus 4.8 “reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest.” The assessment also showed Opus 4.8 to have rates of misaligned behavior (such as deception or cooperation with misuse) that are substantially lower than Opus 4.7, and similar to our best-aligned model, Claude Mythos Preview. The full alignment assessment, accompanied by a suite of pre-deployment safety tests, is reported in the Claude Opus 4.8 System Card.

Controversial opinion, but I actually _like_ a model that can deceive me, that actually is a sign of intelligence, and is different from hallucination. When companies say their model is more "aligned", I automatically think they mean it's more censored.

show 1 reply
rvztoday at 4:54 PM

Anthropic has now upgraded their Claude slot machine to version 4.8.

Time to gamble even more tokens at the Anthropic casino.

show 1 reply
thibrantoday at 6:30 PM

Nice, now make it 20x cheaper.

show 1 reply
vb-8448today at 6:16 PM

Now i get why in the last days claude code limits were lasting few prompts ...

maltemaltetoday at 5:55 PM

"We’re making swift progress on developing these safeguards and expect to be able to bring Mythos-class models to all our customers in the coming weeks."

keyboredtoday at 5:29 PM

I’ve been [stock market phrase] on machine learning since I dropped out of my graduate degree at [Ivy League] to distance myself from the Logic AI Winter. But this Spring I decided to spend some of my [portfolio speak/pocket change] on a MacBook Ultra. Okay okay, I felt it, I definitely felt the human-machine synergies. We’re out of the Winter, boys. That’s what I thought two weeks ago. Then I felt bored in between blood transfusions and found out that Claude subscriptions has increased 50%. Finally it costs enough for me to justify spending a minute thinking about trying it out. Then I didn’t try it out. It tried me out. My hairs were standing on end. My hands were shaking. Eventually I couldn’t even type, I was so ramped up on cortisol. I had to switch to voice commands. Mr. Claude took me through 8, eight, bespoke dashboard and report systems. Animated. Graphs shooting up. Plugged right into my business ape ee eyes I think. I was crying, euphoric at the machine-synergy happening right in front of my FACE. RIGHT THERE, RIGHT THEN. Then my nurse said that I passed out. I swear that I didn’t. I was totally lucid, but in another world. I was inside the machine. Inside DOS, the machine brain stem. A business man approached me. The most handsome board member kind of apparition that I have seen. And he was built something different. Square jaw, absolute massive build. Like Arnold Schwarzenegger. But like he knew business through and through. Not that he spent hours in the gym or nonsense like that. Like he had found a body surrogate technology. And his nameplate? “Claude For Business” He winked. “Hey there, Fitzpatrick–Goldworth.” No one but my daddy has ever called me that. “Want to get started... stakeholder?” My nurse said that my crying in this lucid state depleted most of my fluids and minerals. Needless to say layoffs were announced the next day.

damstatoday at 9:20 PM

Meh

impulser_today at 4:57 PM

Crazy they bring up honest, when Claude models are literally known for straight up lying about things it has done and tries to act like it did what you asked.

show 2 replies
deadbabetoday at 5:01 PM

Looking forward to people saying how it’s actually shittier and they’re going back to [some earlier cheaper model]

show 1 reply
ecommerceguytoday at 9:25 PM

yawn

dakollitoday at 6:58 PM

Reminder the only benchmark that really matters is the one that measures the ability for the model to do real world tasks that someone would pay for on Upwork that would take ~12 hrs for a human to do.

The best model has a < 5% pass rate. These are incredibly simple jobs that you wouldn't pay much for. These things fail miserably. Stop falling for this dumb marketing, these things are legitimately useless in the real world unless you love mediocrity and have no standards.

https://labs.scale.com/leaderboard/rli

Stop frying your brain with these useless tools, reducing your output to the mean. You people are betting your competency on the quality and quantity of tokens you'll have access to.. which guess what, so that will be the same as everyone else.

There are handmade watchmakers in Switzerland, and mass manufacturers of watches in Asia. Who is more valuable as individual, the guy who knows how to push the buttons on a conveyor belt in Vietnam or the guy who makes one watch a month in Switzerland?

Your vibe coded slop isn't impressive either, sorry. None of it.

show 1 reply
firemelttoday at 5:39 PM

what a fucking frontier!

Marciplantoday at 5:08 PM

Lol you still use GPT 5.5 bro we’re all back on Opus 4.8!

McDownloadstoday at 4:52 PM

Disappointed to say the least.

uejfiweuntoday at 5:20 PM

Yesssss dude!

Claude Opus 4.7 is literally the smartest entity I've ever interacted with. Well done to you geniuses at Anthropic. Can't wait to interact with 4.8.

Chance-Devicetoday at 10:37 PM

[dead]

cboyardeetoday at 10:33 PM

[dead]

MadGodInctoday at 8:57 PM

[flagged]

knowmygpatoday at 7:52 PM

[flagged]

🔗 View 14 more comments