logoalt Hacker News

satvikpendemtoday at 6:10 PM20 repliesview on HN

> Evaluations also show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.

Why would they brag about something like this? It's like they know people want to use models to perform cybersecurity tasks yet knowingly deny them the ability.

And Opus 4.8 is still cheaper for a higher pass rate (much less open weight models like GLM 5.2) so not sure why I'd use Sonnet except on the low effort level for I suppose trivial tasks where I want it to work only 50% of the time judging by the graph. The pricing doesn't really make any sense.


Replies

secretsloltoday at 6:27 PM

"Lower ability to perform cybersecurity-related tasks" makes me super concerned it will leave my codebase like Swiss cheese for any American granny with access to Fable 5, when we non-American Brits, or rest-of-worlders, don't have access to it to clean our codebases.

show 5 replies
zlurkertoday at 6:15 PM

They spent months hyping up Mythos and ended up with it banned. I’d assume they want to both differentiate their products and appeal to regulators here

show 2 replies
MostlyStabletoday at 6:23 PM

Why do you think they are bragging? Anthropic has long been the company to give us by far the most in-depth information about their models, both positive and negative. I read this as them just stating a fact about this model that users would want to know.

show 3 replies
bluepetertoday at 6:41 PM

Flowers for Algernon. And, sadly, expect this from now on. You saw it with OpenAI releasing Sol/Terra/Luna with a chart showing how they weren't quite as good as Mythos. It's all messaging to the USG to try to avoid/minimize arbitrary review from multiple agencies. 'Hey, it's smart, but look how stupid it is at "cyber."'

kristianctoday at 6:25 PM

There's two classes of models now - the cybersecurity ones that none of us are getting, and the 'safe' models released for general consumption. This is letting us know which side of the divide it sits on.

show 2 replies
pseudosavanttoday at 8:23 PM

So that the current US administration doesn't block broad usage of Sonnet 5 probably. They'd have to collect your ID and approve you if it was good at cybersecurity. Because such is the freedom in the U.S. right now.

dgacmutoday at 6:35 PM

One of the best queries I've done with an LLM recently was: Create a plan for improving the robustness and resilience of this code, particularly to untrusted inputs.

Gemini wouldn't do a security audit. But it came up with a great set of mitigations and identified an extant XSS flaw in the process of improving robustness.

There's an awful lot of good that can come from proactive, defensive use of LLMs. I realize there's also a lot of pain when the difficulty of exploit finding drops suddenly, but in the long term we may all benefit from the defensive side of this.

K0balttoday at 6:22 PM

Restricting the models isn’t about restricting offensive capabilities. They were already very well aligned to reduce that risk.

This recent government interference is about trying to preserve US offensive cyberwarfare and cyberespionage capabilities. It’s not about “bad actors”. It’s about defensive capabilities becoming pervasive and cheap, which would kneecap us cyberoffensive capability.

It’s like making seatbelts illegal so that police chases can be more effective.

lanthissatoday at 6:20 PM

so it doesn't get blocked. last time they said a model was great at cyber it didnt turn out well

nozzlegeartoday at 7:10 PM

It seems obvious to me that they put that in there in an effort to avoid another reaming out by the long, orange dick of the US government.

Philpaxtoday at 6:12 PM

To avoid Lutnick getting on their case again.

show 1 reply
johnfntoday at 6:41 PM

> Why would they brag about something like this? It's like they know people want to use models to perform cybersecurity tasks yet knowingly deny them the ability.

What exactly do you want Anthropic to say here? "This model, the one we are about to give to the entire world for cheap, is really good at hacking"? Saying Sonnet is terrible at cybersecurity is the most reasonable thing they can say, out of a lot of bad options.

2001zhaozhaotoday at 6:48 PM

They are obviously trying to avoid getting Sonnet 5 blocked.

doctoboggantoday at 6:12 PM

You have to pay more for that, and/or go through some USG vetting process.

WithinReasontoday at 6:31 PM

That part is likely directly addressed to the US government.

chvidtoday at 6:33 PM

Does it mean it generates code with random security holes?

jayd16today at 6:34 PM

Market segmentation?

re-thctoday at 6:34 PM

> And Opus 4.8 is still cheaper for a higher pass rate

Unless it spams as much as Opus, I doubt it. Opus 4.8 literally spams text like puke. On a longer run especially if you get cache misses here and there the bulk of the cost is all the extra context it adds.

drcongotoday at 6:33 PM

What makes that a brag?