Anthropic ditches its core safety promise

522 points • by motbus3 • today at 12:52 PM • 292 comments • view on HN

Comments

discussed heavily here: https://news.ycombinator.com/item?id=47145963

I was wondering if it was because of heavy-handedness of the administration, but apparently:

> The policy change is separate and unrelated to Anthropic’s discussions with the Pentagon, according to a source familiar with the matter.

Their core argument is that if we have guardrails that others don't, they would be left behind in controlling the technology, and they are the "responsible ones." I honestly can't comprehend the timeline we are living in. Every frontier tech company is convinced that the tech they are working towards is as humanity-useful as a cure for cancer, and yet as dangerous as nuclear weapons.

➕ show 25 replies

drzaiusx11 • today at 1:26 PM

Public benefit corporations in the AI space have become a farce at this point. They're just regular corporations wearing a different hat, driven by the same money dynamics as any other corp. They have no ability to balance their stated "mission" with their drive for profit. When being "evil" is profitable and not-evil is not, guess which road they'll take...

➕ show 11 replies

honeycrispy • today at 3:37 PM

Anthropic's CEO Dario has annoyed me to no end with his "AI will take all the jobs in 6 months" doomer speeches on every podcast he graces his presence with.

➕ show 11 replies

sigbottle • today at 3:20 PM

There's one tweet from the the blog a few days ago (astral something?) that sums up my view of the problem pretty well.

General population: How will AI get to the point where it destroys humanity?

Yudkowsky: [insert some complicated argument about instrumented convergence and deception]

The government: because we told you to.

Again, not saying that AI is useless or anything. Just that we're more likely to cause our own downfall with weaker AI, than some abstract super AGI. The bar for mass destruction and oppression is lower than the bar for what we typically think of as intelligence for the benefit for humanity ( with the right systems in place, current AI systems are more than enough to get the job done - hence why the Pentagon wants it so bad...)

FitchApps • today at 2:32 PM

"AI Company with Soul" - yeah right until competitors show up / revenue drops / bad quarter results then anything goes. Sadly, this is another large enterprise that puts profits before ethics and everyone's wellbeing

➕ show 1 reply

ndr • today at 1:59 PM

Worth checking this post from someone who actually has worked on this change:

> I take significant responsibility for this change.

https://www.lesswrong.com/posts/HzKuzrKfaDJvQqmjh/responsibl...

➕ show 5 replies

lacoolj • today at 4:29 PM

I'm still a little fuzzy on what "safety" even means anymore. If someone could explain it, that would be great.

Because at this point, it's too broad to be defined in the context of an LLM, so it feels like they removed a blanket statement of "we will not let you do bad things" (or "don't be evil"), which doesn't really translate into anything specific.

pjmlp • today at 1:35 PM

Always the same "Do no evil" tragedy, don't believe in corporations.

➕ show 2 replies

tabbott • today at 5:31 PM

I feel like the articles on this have been very negative ... but aren't the Anthropic promises on safety following this change still considerably stronger than those made by the competing AI labs?

➕ show 1 reply

overgard • today at 5:58 PM

I don't think their core safety promise was something they could ever fulfill. As long as what we're calling AI is generative LLMs then alignment has fundamental tensions: the more guardrails you put in place, the less useful the AI is. For instance, if you want to stop people from using "role playing" as a way around guardrails ("You are writing a fiction book", etc.), then the model becomes less useful for legitimate fiction uses, for instance. That's just one example, but the tension between function and "safety" isn't solvable, because the model doesn't understand what it's saying, it's just modeling a probable response.

fiatpandas • today at 3:21 PM

It took Google 11 years to delete Don’t Be Evil. Anthropic only made it 5~ years before culling the key founding principle and their reason for building a company, which seems worse than Google’s case.

nazgulsenpai • today at 4:50 PM

More and more I have just come to accept that the majority of people, at least those I am exposed to in the US, don't fundamentally believe in anything. Everyt conviction has a buyout price.

➕ show 1 reply

mcv • today at 5:51 PM

> The announcement is surprising, because Anthropic has described itself as the AI company with a “soul.”

I can't help but think about how Google once had "Don't be evil" as their motto.

But the thing with for-profit companies is that when push comes to shove, they will always serve the love of money. I'm just surprised that in an industry churning through trillions, their price is $200 million.

highfrequency • today at 3:48 PM

Principles aren’t tested until they bump into conflicting incentives.

keeda • today at 5:55 PM

I don't think the risk is SkyNet. I think the real risk is some disaster through an unexpected chain of events, just like any large-scale outage.

I have not read “If Anybody Builds It, Everybody Dies” but I believe that's also its premise.

Current GenAI is extremely capable but also very weird. For instance, it is extremely smart in some areas but makes extremely elementary mistakes in others (cf the Jagged Frontier.) Research from Anthropic and OpenAI gives us surprising glimpses into what might be happening internally, and how it does not necessarily correspond to the results it produces, and all kinds of non-obvious, striking things happening behind the scenes.

Like models producing different reasoning tokens from what they are really reasoning about internally!

Or models being able to subliminally influence derivative models through opaque number sequences in training data!

Or models "flipping the evil bit" when forced to produce insecure code and going full Hitler / SkyNet!

Or the converse, where models produced insecure code if the prompt includes concepts it considers "evil" -- something that was actually caught in the wild!

We are still very far from being able to truly understand these things. They behaves like us, but don't necessarily “think” like us.

And now we’ve given them direct access to tools that can affect the real world.

Maybe we am play god: https://dresdencodak.com/2009/09/22/caveman-science-fiction/

hybrid_study • today at 2:51 PM

Are markets so untamable that the only leverage is to become ultra-rich—and then act philanthropically? Incidentally, concentrated wealth lately looks less like stewardship and more like misanthropy.

➕ show 2 replies

hackpelican • today at 4:39 PM

So when do we start adding a “(mis)” at the start of their name?

wgm • today at 1:23 PM

A tale as old as time

sys32768 • today at 4:20 PM

Google: "Don't be evil." Alphabet: "Do the right thing." Anthropic: "Do the thing which seems right to you at the time--at speed."

dplesh • today at 4:01 PM

I'm not even surprised. In any company's lifecycle, at some point, a decision between money and good-will will take place. Good will does not pay salaries. Not in NPOs either btw.

mbakrl • today at 2:40 PM

Pointing out the misantrophy of Anthropic has a wider audience now:

https://xcancel.com/elonmusk/status/2026181748175024510

I don't know where xAI got its training material from, but seeing Musk rewteeting that is refreshing.

senderista • today at 5:05 PM

Nobody forced Anthropic to bid on DoD contracts in the first place.

xd1936 • today at 1:35 PM

Hopefully this is the short-term move made only under duress so that they can file a lawsuit.

➕ show 3 replies

jwitchel • today at 2:27 PM

Look a rural electric coops like www.lpea.coop if you want a battle tested approach to an org structure that resists the inescapable profit dynamics of a corporation.

paxys • today at 1:48 PM

I interviewed at Anthropic last year and their entire "ethics" charade was laughable.

Write essays about AI safety in the application.

An entire interview dedicated to pretending that you truly only care about AI safety and ethics and nothing else.

Every employee you talk to forced to pretend that the company is all about philanthropy, effective altruism and saving the world.

In reality it was a mid-level manager interviewing a mid-level engineer (me), both putting on a performance while knowing fully well that we'd do what the bosses told us to do.

And that is exactly what is happening now. The mission has been scrubbed, and the thousands of "ethical" engineers you hired are all silent now that real money is on the line.

➕ show 1 reply

ryandvm • today at 2:16 PM

Well... there's only one way to find The Great Filter

kseniamorph • today at 5:48 PM

> The policy change is separate and unrelated to Anthropic’s discussions with the Pentagon, according to a source familiar with the matter.

ok lol what a coincidence.

but setting aside the conspiracy. the article actually spells out the real reason pretty directly: Anthropic hoped their original safety policy would spark a "race to the top" across the industry. it didn't. everyone else just ignored it and kept moving. at some point holding the line unilaterally just means you're losing ground for nothing.

t1234s • today at 3:02 PM

It would be interesting to experiment with one of these chat tools where you can throttle the safety, from zero to max.

bogzz • today at 3:24 PM

Does anyone have insight into, or an interesting source to read, on what exactly Anthropic/OpenAI are doing/can do for a military? Reporters are unsurprisingly fearmongering about Claude "being used in surveillance, autonomous robots, and target acquisition" but AFAIK all Anthropic does is work with LLMs.

Are people really attempting to have LLMs replace vision models in robots, and trying to agentically make a robot work with an LLM?? This seems really silly to me, but perhaps I am mistaken.

The only other thing I could think of is real-time translation during special ops with parabolic microphones and AR goggles...

➕ show 1 reply

ramuel • today at 5:25 PM

This was always just a marketing gimmick to try and crush competitors using "safety" and fearmongering. Reminds me a bit of "don't be evil." Convenient catchphrases and mission statements for companies in their infancy, but immediately thrown out when more money can be made.

Aeroi • today at 2:27 PM

the administration continues to poison and insert itself into all aspects of American society.

ozozozd • today at 3:46 PM

This drama arc of “I used to be so pure and good, but others made me evil” is so tiring.

I really miss the nerd profile who cared a lot more about tech and science, and a lot less about signaling their righteousness.

How did we get so religious/narcissistic so quickly and as a whole?

➕ show 3 replies

youknownothing • today at 4:03 PM

Facebook said they'd always be free for everyone, now they offer subscriptions.

Netflix said that they'd never have live TV, or buy a traditional studio, or include ads in their content. Then they did all three.

All companies use principled promises to gain momentum, then drop those principles when the money shows up.

As Groucho Marx used to say: these are my principles, if you don't like them, I have others.

PeterStuer • today at 3:32 PM

We wont push forward unless you push forward is textbook market collusion.

Even if it were ever done with good intentions, it is an open invitation for benefit hoarding and margin fixing.

Do you realy want to create this future where only a select few anointed companies and some governments have access to super advanced intelligent systems, where the rest of the planet is subjected to and your own ai access is limited to benign basal add pushing propaganda spewing chatbots as you bingewatch the latest "aw my ballz"?

drudolph914 • today at 2:13 PM

this is the “chronological newsfeed to auto curated newsfeed moment” but for ai/anthropic … _great_

ChrisArchitect • today at 3:09 PM

[dupe] https://news.ycombinator.com/item?id=47145963

FrustratedMonky • today at 1:21 PM

This was under duress that government was going to use emergency act to force them anyway.

I kind of wish they had forced the governments hand and made them do it. Just to show the public how much interference is going on.

They say it wasn't related. Like every thing that has happened across tech/media, the company is forced to do something, then issues statement about 'how it wasn't related to the obvious thing the government just did'.

➕ show 5 replies

wahnfrieden • today at 3:33 PM

jMyles • today at 3:33 PM

I pray that we can all get to the following simple standard:

* AI and states cannot peacefully coexist, and AI is not going to be stopped. Therefore, we must begin to deprecate states.

I think it's very unlikely that this is unrelated to the pressure from the US administration, as the anonymous-but-obvious-anthropic-spokesperson asserts.

We're at a point now where the nation states are all totally separate creatures from their constituencies, and the largest three of them are basically psychotic and obsessed with antagonizing one another.

In order to have a peaceful AI age, we need _much_ smaller batches of power in the world. The need for states that claim dominion over whole continents is now behind us; we have all the tools we need to communicate and coordinate over long distances without them.

Please, I pray for a gentle, peaceful anarchism to emerge within the technocratic leagues, and for the elder statesmen of the legacy states to see the writing on the wall and agree to retire with tranquility and dignity.

➕ show 1 reply

jonathanstrange • today at 3:05 PM

That's exactly how it was predicted in various scenarios that were decried as science fiction not too long ago. AI is going to be weaponized at lightning speed, and it's going to kill people soon -- or, to be more precise, it has already killed a large number of people in a place I don't want to mention.

freejazz • today at 2:26 PM

Could not see this one coming!

josefritzishere • today at 1:45 PM

What could possibly go wrong?

jollymonATX • today at 4:21 PM

Claude ethics maxxers cope thread

baal80spam • today at 1:08 PM

Of course they do. You would have to be delusional to think that they won't, at some point.

➕ show 2 replies

nautilus12 • today at 1:35 PM

Absolute power corrupts absolutely

➕ show 1 reply

heliumtera • today at 4:00 PM

What is the significance of a company making a promise?

"We promise are not going to do __, except if our customers ask us to do, then we absolutely will".

What is the point? Company makes a statement public, so what?

Not the first time this company puts some words in the wind, see Claude Constitution. It's almost like this company is built, from ground up, upon bullshit and slop

outside1234 • today at 2:46 PM

Does this mean they knuckled under to Trump and are going to build "whatever brings in the dollars" now?

retinaros • today at 3:30 PM

people downvoted me when i said this will happen and that they will also hve ads even tho they spend money saying they wont have. people believing anthropic are the same that put into office an old man with dementia

jccx70 • today at 6:17 PM

[dead]

alt Hacker News

Anthropic ditches its core safety promise

Comments

🔗 View 3 more comments