logoalt Hacker News

Claude Opus 4.7 Model Card

145 pointsby adocompletetoday at 2:32 PM71 commentsview on HN

Comments

bachittletoday at 3:41 PM

So Opus 4.7 is measurably worse at long-context retrieval compared to Opus 4.6. Opus 4.6 scores 91.9% and Opus 4.7 scores 59.2%. At least they're transparent about the model degradation. They traded long-context retrieval for better software engineering and math scores.

show 6 replies
vessenestoday at 6:27 PM

This is an interesting document, in that it reads like a Claude Mythos model card that was hastily edited to be an Opus 4.7 model card.

I surmise that someone at the top put the Mythos release on hold, and the product team was told "ship this other interim step model instead. quickly."

I wonder if 4.7 will be seen as a net step-up in quality; there are some regressions noted in the document, and it's clearly substantially worse than Mythos, at least according to its own model card. Should be an interesting few months -- if I were at oAI I'd be rushing to get something out that's clearly better, and pressing for weakness here.

show 2 replies
koehrtoday at 2:50 PM

This reads more like an advertisement for Mythos, on the first glance

show 2 replies
kube-systemtoday at 4:48 PM

> Chemical and biological weapons threat model 2 (CB-2): Novel chemical/biological weapons production capabilities. A model has CB-2 capabilities if it has the ability to significantly help threat actors (for example, moderately resourced expert-backed teams) create/obtain and deploy chemical and/or biological weapons with potential for catastrophic damages far beyond those of past catastrophes such as COVID-19.

That's an interesting choice of benchmark for measuring the risk of "Chemical and biological weapons"

show 1 reply
Symmetrytoday at 3:34 PM

> The technical error that caused accidental chain-of-thought supervision in some prior models (including Mythos Preview) was also present during the training of Claude Opus 4.7, affecting 7.8% of episodes.

>_>

mslatoday at 5:31 PM

PDF, because it isn't marked.

show 1 reply
100mstoday at 2:55 PM

    $ pbpaste | wc -w 
    62508
    $ pbpaste | grep -oi mythos|wc -w
    331
    $ pbpaste | grep -oi opus|wc -w
    809
aliljettoday at 3:01 PM

Have they effectively communicated what a 20x or 10x Claude subscription actually means? And with Claude 4.7 increasing usage by 1.35x does that mean a 20x plan is now really a 13x plan (no token increase on the subscription) or a 27x plan (more tokens given to compensate for more computer cost) relative to Claude Opus 4.6?

show 3 replies
Rekindle8090today at 7:12 PM

Can someone please explain the point of these incremental upgrades? Just release one model. Then maybe do a .5. Then do the next version.

What is the justification for .4.5.6.7.8.9 when the difference isn't measurable and it destroys productivity because they test the next increment on the previous one without customer consent?

bicepjaitoday at 2:38 PM

This card is a 272 page report. So now we are redefining names :)

show 1 reply
STRiDEXtoday at 3:01 PM

Dumb question but why are chemical weapons always addressed as a risk with llms? Is the idea that they contain how to make chemical weapons or that they would guide someone on how?

Would there not already be websites that contain that information? How is an llm different, i guess, from some sort of anarchist cookbook thing.

show 7 replies
joeumntoday at 3:04 PM

I'm actually surprised at how it performed compared to 4.6 and also compared to mythos. Will be fun to use.

il-btoday at 3:29 PM

Ironically, the website is down

jmward01today at 2:45 PM

Haiku not getting an update is becoming telling. I suspect we are reaching a point where the low end models are cannibalizing high end and that isn't going to stop. How will these companies make money in a few years when even the smallest models are amazing?

show 4 replies
nothinkjustaitoday at 4:19 PM

How much do you want to bet this is Mythos, and Anthropic released it as Opus to avoid embarrassment after all the hype they whipped up…

NickNaraghitoday at 3:58 PM

232 pages is bullshit. Longer than the Mythos system card? What are you hiding.

nullctoday at 6:34 PM

The model card doesn't mention if this revision will continue to make up and fan vicious conspiracy theories like the prior one does.

I've getting a small but steady stream of harassment from mentally ill people who get spun up on crazy conspiracy theories and claude is all too willing to tell them they are ABSOLUTELY RIGHT, encourage them to TAKE ACTION, and telling them that people who disagree are IN ON IT.

The other major AI LLM services will shut down the deflect to be less crazy or shut down conversation entirely, -- but it seems claude doesn't. Anthropic is probably the worst about prattling on about safety but it seems like their concern is mostly centered on insane movie plot threats and less concerned about things with more potential for real harm.

I've complained to anthropic with no response.

pukaworkstoday at 5:01 PM

[dead]

gignicotoday at 6:55 PM

So LLMs are destroying the economy and the environment but at least “catastrophic risk” is still low. Ok then…

deflatortoday at 4:25 PM

Model Welfare? Are they serious about this? Or is it just more hype? I really don't trust anything this company says anymore. "We have a model that is too dangerous to release" is like me saying that I have a billion dollars in gold that nobody is allowed to see but I expect to be able to borrow against it.