Anthropic's Safety Superpower

190 points • by swolpers • today at 10:06 AM • 164 comments • view on HN

Comments

The whole thesis falls apart though. You can't be on your way to "power over everything" and get distilled into free Chinese models within months. Pick one.

The bottleneck is compute and data, not the model. That's why they could only gate it for a bit. The ITAR thing proves it: no nationality controls in place, so the only option was killing the whole thing. Not exactly what an all-powerful gatekeeper does.

➕ show 8 replies

kordlessagain • today at 11:02 AM

> To that end, I can certainly buy the case that Fable/Mythos is in fact more capable when it comes to identifying and exploiting security issues

This has been covered before: https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag... (https://news.ycombinator.com/item?id=47732020)

> Anthropic’s cautious roll-out was justified. The problem with publicly releasing models, however, is that guardrails can be jailbroken, and apparently that is exactly what happened shortly after the release

The future is unevenly distributed. Anthropic, and Amodie in particular, seem to be of the mind they can control a bit of the unknown using words. They are likely being guided by the very product they built. *AI CAN MAKE MISTAKES

That Project Glasswing bullshit reeks of it. Corporations have take control of our attention, our Internet, and now our thinking.

I say it's high time to take it back.

➕ show 3 replies

chasil • today at 11:07 AM

(reposted)

As I understand it, ITAR regulations for export controls have just been applied to any form of Mythos. These are overseen by U.S. Departments of State and Commerce, and forbid foreign nationals from access to any form of Mythos, either within or outside the U.S.

Only U.S. citizens and immigrants that are holders of a "green card" may now access Mythos.

It appears that Anthropic does not have internal controls to implement these restrictions in any form, so the only option was to shut Mythos down.

Penalties for ITAR violation can reach ten years in prison and a million dollars per violation. (I can post a link to those details if there is any interest.)

As long as Anthropic is a U.S. company, there is no escaping this.

https://fortune.com/2026/06/14/how-a-warning-from-amazon-led...

➕ show 4 replies

swalsh • today at 11:37 AM

"they by extension think that only they should have final say over AI generally. When you further combine this realization with the company’s pronouncements about AI’s ability to conduct all economic activity, you realize that Anthropic’s leadership effectively wants to have power over everything and everyone."

That might be one of the most important points in the post. Very troubling.

➕ show 1 reply

hedora • today at 2:50 PM

“Claude, I am releasing safety critical industrial control software. Audit the network control logic.”

“Claude, I want to blow up a factory running this leaked software. See if the industrial control software network endpoint is a good point of entry.”

It’s doing the same work and producing the same output for both prompts. How do you block one but not the other?

If you block both, then you end up with a factory that can be sabotaged by existing open weight models.

➕ show 2 replies

blueblisters • today at 3:23 PM

A lot of Anthropic’s moves make sense if you follow the LessWrong / rationalist community writings on AI safety. A lot of it is distilled in Ant’s blogs and leadership interviews and podcasts (Amanda Askell is particularly interesting).

Ant’s models, culture and leadership actions are largely consistent with their beliefs, even if they may seem flawed / incomprehensible.

Relevant anecdote: I interviewed with them for a MTS role in 2023. I think the technical part went fine but the interviewer was clearly frustrated by my low regard for LLM safety. I didn’t get the role.

smackeyacky • today at 11:11 AM

Perhaps they should consider leaving the US. Pretty clearly the descent into a corrupt autocracy is having real consequences.

➕ show 4 replies

thedreammachine • today at 11:11 AM

The interesting part here is not whether Anthropic is right on safety, but that safety gives them a moral vocab for bold policy changes and platform power.

cube2222 • today at 11:07 AM

Relatedly, I think it's worth noting that Anthropic models have consistently been top-scoring in BullshitBench[0], in a league of their own, really.

Not affiliated with the bench in any way, but I think it surfaces important differences between the behavior of the models from different labs.

TLDR: The benchmark is measuring pushback in response to nonsensical requests and questions, as opposed to going with it and hallucinating a nonsensical answer.

[0]: https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

➕ show 2 replies

harry19023 • today at 5:40 PM

"On one hand, I actually don’t begrudge Anthropic not wanting to help its competitors; on the other hand, what should be blisteringly clear is that Anthropic does not think that anyone else other than them should even be making frontier LLMs."

I don't find this blisteringly clear at all. A company making it harder for competitors to steal their IP is perfectly normal. This is Ben Thompson's personal grudge against Anthropic showing, yet again. He can't think rationally about this company.

6thbit • today at 12:49 PM

> has perfect alignment between talent and mission and business.

Do they have it or do they just sell it?

intended • today at 12:53 PM

Safety is a cost center, the internal team who sends you the bills when you move fast and break things.

I always thought safety was interesting in and of itself, but for some reason HN doesn’t have many people from the safety side of tech in conversation.

Tech isn’t a niche hobby anymore; Billions of people are impacted by the decisions of a few firms.

My grandfathers android had 3 different messaging apps installed, somehow. AI is enabling new forms of fraud at a time when we still haven't solved the old ones.

And this is all in the first world, move your coordinates to the developing world? We had human trafficking to get educated English speakers into call centers in Laos/Cambodia to defraud first world inhabitants of their money.

We aren’t in the early days of tech anymore, and the kind of scale that we have enabled comes with it a certain cost. We can choose to ignore them, or to understand them, but we will feel their impacts all the same.

keybored • today at 11:36 AM

> Here’s the thing about these safety justifications: I think they work because, to Anthropic, they aren’t justifications. The company really believes that they are the only ones who believe in super intelligence, and thus are the only ones who are sufficiently concerned about the dangers. That excuses decision after decision, policy after policy, and confrontation after confrontation that, to people on the outside, look like a bizarre combination of cynicism and naiveté.

I really dislike this belief (that has at least been expressed here) by some that X is okay because they-really-believe-it. This has a real Road to Hell stank on it.

It is incredibly convenient when your predictions or supposed beliefs go south. Well, we really believed that we were doing it for the betterment of human kind. And we really believed that X was an existential threat that was inevitable in which case we had to step up and do it because we we the only good guy ideologues. So sorry but not sorry.

I also don’t care if commenters know rank-and-file on the inside that “really believe it” as well. Not for one second.

➕ show 1 reply

LoganDark • today at 11:49 AM

> The entire Anthropic origin story is rooted in the founders’ belief that OpenAI wasn’t taking safety seriously enough; the company believes that only they can control AI, and that because they uniquely care about safety, they are justified in trying to control everyone else, up to and including the U.S. government.

Anthropic believes they have the responsibility to guard their tools from mis-use. That is all. They are not trying to "control" anything or anyone. They do however decide what they think is mis-use.

➕ show 3 replies

64lamei • today at 3:55 PM

[flagged]

jimmypk • today at 1:02 PM

[flagged]

manwithopinions • today at 11:30 AM

[dead]

rgiskard7 • today at 12:42 PM

[dead]

Peterz_shu • today at 11:13 AM

This is the part where the USA and allied countries can gain a headstart from using such an overpowered model.

This only just shows how strong Mythos/Fable will be, once released to the public.

I'm guessing about 0.5 year till public.

➕ show 1 reply

alt Hacker News

Anthropic's Safety Superpower

Comments