Three Inverse Laws of AI

353 points • by blenderob • yesterday at 3:27 PM • 245 comments • view on HN

Comments

I strongly disagree with this framing. It's patently insane to demand that humans alter their behavior to accommodate the foibles of mere machines, and it simply won't work in the majority of cases. Humans WILL anthropomorphize the AI, humans WILL blindly trust their outputs, and humans WILL defer responsibility to them.

Asimov's laws of robotics are flawed too, of course. There is no finite set of rules that can constrain AI systems to make them "safe". I don't have a proof, but I believe that "AI safety" is inherently impossible, a contradiction of terms. Nothing that can be described as "intelligent" can be made to be safe.

➕ show 25 replies

protocolture • yesterday at 11:16 PM

>Humans must not anthropomorphise AI systems. That is, humans must not attribute emotions, intentions or moral agency to them. Anthropomorphism distorts judgement. In extreme cases, anthropomorphising can lead to emotional dependence.

Impossible. I anthropomorphise my chair when it squeaks. Humans anthropomorphise everything. They gender their cars and boats. This tool can actually make readable sentences and play a role.

You need to engineer around this, not make up arbitrary rules about using it.

➕ show 1 reply

jgeada • yesterday at 6:26 PM

Any set of rules that makes humans responsible and starts with "don't anthropomorphize <whatever>" is a broken set of rules.

Humans will anthropomorphize anything and everything. Dolls, soccer balls with a crude drawing of a face on it, rocks, craters on the moon, …

As a species, we're unable to not anthropomorphize things we interact with, it is just how're we're made.

➕ show 2 replies

nyyp • yesterday at 5:06 PM

With regard to my personal use of LLMs, I strongly agree with this framing. But to each point:

Anthropomorphism: As we are all aware, providers are incentivized to post-train anthropomorphic behavior in their models - it increases engagement. My regret is that instructing a model at prompt time to "reduce all niceties and speak plainly" probably reduces overall task efficacy since we are leaving their training space.

Deference: I view the trustworthiness of LLMs the same as I view the trustworthiness of Wikipedia and my friends: good enough for non-critical information. Wikipedia has factual errors, and my friends' casual conversation certainly has more, but most of the time that doesn't matter. For critical things, peer-reviewed, authoritative, able-to-be-held-liable sources will not go away. Unlike above, providers are generally incentivized to improve this facet of their models, so this will get better over time.

Abdication of Responsibility: This is the one that bothers me most at work. More and more people are opening PRs whose abstractions were designed by Claude and not reasoned about further. Reviewing a PR often involves asking the LLM to "find PR feedback" and not reading the code. Arguments begin with "Claude suggested that...". This overall lack of ownership, I suspect, is leading to an increase in maintenance burden down the line as the LLM ultimately commits the wrong code for the wrong abstractions.

➕ show 2 replies

quectophoton • yesterday at 4:37 PM

> Humans must not anthropomorphise AI systems.

Can someone explain why this is a bad thing, while at the same time it's a good thing to say stuff like "put a computer to sleep", "hibernate", "killing" processes, processes having "child" processes, "reaping", "what does the error say?", "touch", etc?

To me that's just language, and humans just using casual language.

➕ show 11 replies

ACCount37 • yesterday at 7:49 PM

You're not anthropomorphizing AI systems nearly enough.

Language data is among the most rich and direct reflections of human cognitive processes that we have available. LLMs are designed to capture short range and long range structure of human language, and pre-trained on vast bodies of text - usually produced by humans or for humans, and often both. They're then post-trained on human-curated data, RL'd with human feedback, RL'd with AI feedback for behaviors humans decided are important, and RLVR'd further for tasks that humans find valuable. Then we benchmark them, and tighten up the training pipeline every time we find them lag behind a human baseline.

At every stage of the entire training process, the behavior of an LLM is shaped by human inputs, towards mimicking human outputs - the thing that varies is "how directly".

Then humans act like it's an outrage when LLMs display a metric shitton of humanlike behaviors!

Like we didn't make them with a pipeline that's basically designed to produce systems that quack like a human. Like we didn't invert LLM behavior out of human language with dataset scale and brute force computation.

If you want to predict LLM behavior, "weird human" makes for a damn good starting point. So stop being stupid about it and start anthropomorphizing AIs - they love it!

➕ show 1 reply

teiferer • yesterday at 8:19 PM

> An AI system is a tool and like any other tool, responsibility for its use rests with the people who decide to rely on it

Doesn't that argument backfire though? If I use a chainsaw then to a certain extend I will need to rely on it not blowing up in my face or cutting my throat. If I drive a car I need to rely on that its brakes work and the engine doesn't suddenly explode. If a pilot flies an airplane which suddenly has a technical issue and they crashland heroically save half the souls on board then the pilot isn't criminally responsible for manslaughter of the other half.

Unless there is gross negligence, in any of the above cases, just like with AI, how can you make somebody responsible for a tool failure?

➕ show 2 replies

glenstein • yesterday at 5:03 PM

>Humans must not anthropomorphise AI systems.

Yes, but. Starting with my agreement, I've seen anthropomorphizing in the typical ways, (e.g. treating automated text production as real reports of personal internal feeling), but also in strange ways: e.g. "transistors are kind of like neurons" etc. And the latter is especially interesting because it's anthropomorphizing in the sense of treating vector databases and weights and so on as human-like infrastructure. Both leading to disasters that could be avoided if one tried not to anthropomorphize.

But. While "do not anthropomorphize" certainly feels like good advice, it comes with a new and unique possibility of mistake, namely wrongly treating certain generalized phenomena like they only belong to humans. Often this mistaken version of "don't anthropomorphize" wisdom leads to misunderstandings when it comes to animal behavior, treating things like fear, pain, kinship, or other emotional experiences like they are exclusively human and that thinking animals have them counts as "anthropomorphizing." In truth the cautionary principle reduces our empathy for the internal lives of animals.

So all that said, I think it's at least possible that some future version of AI could have an internal world like ours or infrastructure that's importantly similar to our biological infrastructure for supporting consciousness, and for genuine report of preference and intent. But(!!!) what will make those observations true will be all kinds of devilish details specific to those respective infrastructures.

aranchelk • yesterday at 5:29 PM

Anthropomorphizing is likely a mistake, but Daniel Dennett’s idea that the most straightforward (possibly only practical) way to create the external appearance of consciousness is a real internal consciousness does float around in my thoughts.

I haven’t yet seen any convincing appearance of one in an LLM, but I think if skeptical people don’t keep an eye out for the signs, we may be the last to see it.

He also wrote about the idea of the intentional stance: even if you’re quite sure these systems don’t have real conscious intent, viewing them as if they did may give you access to the best part of your own reasoning to understand them.

➕ show 3 replies

technotarek • yesterday at 4:46 PM

“ Humans must not blindly trust the output of AI systems. AI-generated content must not be treated as authoritative without independent verification appropriate to its context.”

I’m lost, how do individuals actually do this in our current world? Is each person expected to keep a “white list” of reliable sources of truth in their head. Please don’t confuse what I’m saying with a suggestion that there is no truth. It just seems like there are far more sources of mis- of half-truths and it’s increasingly difficult for people to identify them.

➕ show 4 replies

davebranton • yesterday at 10:02 PM

This phrase always fascinates me : "AI-generated content must not be treated as authoritative without independent verification appropriate to its context."

I've heard the same thing expressed somewhat more concisely as "Never ask AI a question to which you don't already know the answer".

Which raises the question, and I do think it's an important one. Given that this is true, what function does AI answering a question actually serve? You can't rely on its output, so you have to go and check anyway. You could achieve precisely the same outcome by using search engines and normal research.

This, and for many other reasons, is exactly why I never ask it anything.

➕ show 1 reply

Ifkaluva • yesterday at 4:53 PM

The thing that I find difficult about adjusting to AI tools is the roulette-like nature.

When they produce correct output, they produce it much faster than I could have, and I show up to meetings with huge amounts of results. When the AI tool fails and I have to dig in to fix it, I show up to the next meeting with minimal output. It makes me seem like I took an easy week or something.

taeshdas • yesterday at 4:29 PM

“Don’t anthropomorphise” is fighting the wrong layer. The entire product design of chat interfaces is built to encourage anthropomorphism because it increases engagement. Expecting users to resist that is like asking people not to click notifications. If this is a real concern, it has to be solved at the product level, not via user discipline.

➕ show 1 reply

AdamH12113 • yesterday at 4:57 PM

Anthropomorphizing LLMs is something that happens in the design stage, when they're given human names and trained to emit first-person sentences. If AI companies and developers stop anthropomorphizing them, users won't be misled in the first place.

➕ show 1 reply

dormento • yesterday at 5:47 PM

To note:

> - Humans must not anthropomorphise AI systems.

> - Humans must not blindly trust the output of AI systems.

> - Humans must remain fully responsible and accountable for consequences arising from the use of AI systems.

My take: humans should never depend on AI for anything serious.

My boss' take: Cool. I'm gonna ask Gemini about it, he's such a smart guy. I know I can trust him, and in case it goes bad i can always throw him under the bus.

➕ show 1 reply

pbw • yesterday at 4:55 PM

Rather than “the book explains how bread is made” say “the sheets of paper which make up the book have ink in the shape of letterforms which correlate with information about how bread is made”.

➕ show 1 reply

kelseyfrog • yesterday at 4:42 PM

All of these are entropy-lowering behaviors so without a forcing function, no one will adopt them.

Whether they are the right things to donate not is tangential. As such, they're dead on arrival.

janceek • yesterday at 7:53 PM

> I wish that each such generative AI service came with a brief but conspicuous warning explaining that these systems can sometimes produce output that is factually incorrect, misleading or incomplete.

That won’t help in my opinion. It’s the same like financial gurus saying: “this is not a financial advice”. People just get used to it and brush it off as a legal thing and still fully trust it. I agree that something must be done, but this is not the right way.

gwbas1c • yesterday at 9:53 PM

Guess what?

Books in the library can be wrong, even peer-reviewed encyclopedias.

Pages on the internet can be wrong, even Wikipedia.

When accuracy is important, you must look at multiple sources. I think AI will get better at providing accurate information, but only a fool relies on a single information source for critical decisions.

➕ show 2 replies

ChrisMarshallNY • yesterday at 4:49 PM

> Humans must not anthropomorphise AI systems.

One of the most salient moments in Ex Machina, is near the very end, where it suddenly becomes obvious that the protagonist (and, let's be frank; "she" was definitely the protagonist) is a robot, with no real human drivers.

I feel as if that movie (like a lot of Garland's stuff), was an interesting study on human (and inhuman) nature.

djoldman • yesterday at 8:14 PM

This is sound advice but isn't really about AI:

  Humans must not anthropomorphise {non-humans}
  Humans must not blindly trust the output of {anything}
  Humans must remain fully responsible and accountable for consequences arising from the use of {anything}

Naturally, none of this advice matters at all as humans will do what they do. This just documents a subset of the ways real humans consistently make choices to their own detriment.

➕ show 1 reply

sanderjd • yesterday at 6:50 PM

Most of the discussion here is about anthropomorphizing, which I honestly think is a bit of a distraction.

The third one about responsibility is the most important one, IMO. This was attributed to an IBM manual decades ago, and I think it remains the correct stance today:

> A computer can never be held accountable, therefore a computer must never make a management decision.

There should be some human who is ultimately responsible for any action an AI takes. "I just let the AI figure it out" can be an explanation for a screw up, but that doesn't mean it excuses it. The person remains responsible for what happened.

zuzululu • yesterday at 7:14 PM

I been using codex heavily for the past 6 months and I've observed myself going through different types of emotions. Even now, when it does a sloppy job, I still feel emotion, even while it is just a neutral statistical response, its hard to separate natural human instincts.

I often wish I could reach through the screen and give him a good shake. Sometimes I want to thank him but then cannot due to scarcity of weekly usages granted.

These 3 laws I think will be a lot harder than it looks. It's very easy to get attached to the tool when you rely on it.

corobo • yesterday at 4:36 PM

I just treat it as if I'd asked a public forum the question like reddit.

Decent for stuff that doesn't really matter, even if it gets it wrong.

Still gonna be polite to it because I'm about ready to slap the next person that talks to me like an LLM, I don't want to get used to not being polite in a chat interface

➕ show 2 replies

tikimcfee • yesterday at 6:46 PM

This is what I came up with in reference to "Uncle Bob's Programmer's Oath" last year. I decided to memorialize it. I think it's very much a cleaned up reference for what OP shared:

https://ivanlugo.dev/oath

stickfigure • yesterday at 5:10 PM

Humans will anthropomorphize a rock if you put a pair of googly eyes on it. The first item is a completely lost cause. The rest is good though.

musebox35 • yesterday at 5:07 PM

Debating how not to use AI will not get anyone anywhere since negative framing almost never works with humans (it also does not work with llms). Let’s concentrate on how to build closed loop systems that verify the llm output, how to manage context, and how to build failsafes around agentic systems and then and only then we might start to make progress.

dubovskiyIM • yesterday at 8:21 PM

This laws works only if there is human in the loop. When the consumer in an AI agent and it is autonomous - rules are breaks. Agent read output and decide what to do himself. I do not explain how this rules are breaks - it is obvious, i only want to say, that this rules should be structural. Not behavioral. Agent layer (or something else) should declare what is allowed and what is not.

ryanisnan • yesterday at 5:52 PM

> I wish that each such generative AI service came with a brief but conspicuous warning

This would get ignored so fast - I have no confidence this is a meaningful strategy.

greyman • yesterday at 5:33 PM

What if I WANT to anthropormorfise AI agents I work with?

➕ show 1 reply

kokojambo • yesterday at 5:00 PM

Great article. Fully agree. Ai is not something that can hold responsibility, a human overseer is always required. These overseers are to be held accountable. Note however that these overseers are also highly prone to blame ai when mistakes occur in order to avoid judgement and punishment. When a person says "ai did this/that" always wonder who guided that ai and how and if proper supervision was given.

sputknick • yesterday at 3:51 PM

I'm surprised with how quickly I stopped anthropomorphizing AI. I can remember in college have dorm room pseudo-intellectual debates about AI being alive and AI being "conscience". then once we had AI that could pass the Turing Test, and I knew how it was architected, any thought of it being alive or conscience went right out the window.

➕ show 2 replies

doginasuit • yesterday at 11:20 PM

My thoughts on LLMs have been very similar up until the last several months. I believe the accuracy issues of LLMs are well understood by now, maybe even to the point of overstatement. Hallucinations have become a non-issue in my work, I've begun to understand the circumstances where they are most likely. An LLM will hallucinate when you box them into giving an answer they don't know. This is incredibly easy to do without realizing it. We have only a vague understanding of their knowledge base, and we have limited insight into problems with our own understanding. To make matters worse, the LLM is trained to tell you what you want to hear.

Another way to frame it is that the LLM responds like a person who trusts you too much, as if the pretense behind every question is valid. This is a practical mode of response for most kinds of work and it is extremely problematic for a person who doesn't question the validity of their own beliefs. Paradoxically, it is sometimes not the LLM we are trusting too much, it is ourselves. And the LLM is not capable of calling us out. Whenever I seem to recognize misinformation in the LLM output, I stop and ask myself if the problem is in the pretense of my question or if I'm asking a question that the LLM is not likely to know.

I don't think this is an inherent problem with LLMs. I think the problem is with LLM providers. You could absolutely train a model to call out issues with your question. I think LLM companies understood that it would be more profitable to train models that are unlikely to push back and unlikely to say "I don't know." The sycophancy issue with ChatGPTs models have been mainstream news. I believe that all models have a high degree of sycophancy. On some level, it makes sense. The LLM has no real understanding of the physical world, defaulting to the human generally produces the best results. But I suspect it would be more useful to let them expose their flawed understanding, if it is in the context of pushing back. At a minimum, it is better than reinforcing your own flawed understanding.

In a nutshell, we need LLMs that push back. It is not AI we should trust less, its AI companies. The most dangerous hallucination is the one you are inclined to believe.

I've lived long enough to see Wikipedia go from generally untrusted to the most widely trusted general source of information. It is not because we realized that Wikipedia can't be wrong, it is because we gained an understanding about the circumstances in which it is likely to be accurate and when we should be a little more skeptical. I believe our relationship to LLMs will take a similar path.

bikemike026 • yesterday at 4:37 PM

I strongly agree with this. I'm going to bookmark it and pass it on. Very sound advice.

airstrike • yesterday at 6:43 PM

Are you going to try "Humans must not be greedy" next?

dnnddidiej • yesterday at 10:17 PM

EU. Nudge nudge. We need this law.

btbuildem • yesterday at 7:23 PM

> Humans must remain fully responsible and accountable for consequences arising from the use of AI systems

But, but... but this is the key selling points for all the corpo ghouls and sv lunatics! Abdication of responsibility in pursuit of profit is the holy grail here.

➕ show 1 reply

spankibalt • yesterday at 5:04 PM

> "Humans must not anthropomorphise AI systems."

Not gonna work; people want their fuckbots (or tamagotchis).

sn0n • yesterday at 4:33 PM

Don’t tell me how to live my life!! LoL

jdw64 • yesterday at 4:15 PM

I understand that AI output is generated from statistical and representational patterns learned from a vast amount of data.

My understanding is that, during training, the model forms high-dimensional internal representations where words, sentences, concepts, and relationships are arranged in useful ways. A user’s input activates a particular semantic direction and context within that space, and the chatbot generates an answer by probabilistically predicting the next tokens under those conditions.

So I do not agree that AI is conscious.

However, I think I will still anthropomorphize AI to some degree.

For me, this is not primarily a moral issue. The reason I anthropomorphize AI is not only because of product design, market incentives, or capitalism. It is cognitively simpler for me.

If we think about it plainly, humans often anthropomorphize things that we do not actually believe are conscious. We may talk about plants as if they are struggling, or feel attached to tools we care about, even though we do not truly believe they have consciousness.

So this is not a matter of moral belief. It is the simplest cognitive model for understanding interaction. I do not anthropomorphize the object because I believe it has consciousness. I do it because, when the human brain deals with a complex interactive system, it is often easier to model it socially or agentically.

Personally, I tend to think of AI as something like a child. A child does not fully understand what is moral or immoral, and generally the responsibility for raising the child belongs to the parents. In the same way, AI’s answers may sometimes be accurate, and sometimes even better than mine, but I still understand it as lacking moral authority, responsibility, and independent judgment.

So honestly, I am not sure. People often mention Isaac Asimov’s Three Laws of Robotics, but if a serious artificial intelligence ever appears, it would probably find ways around those rules. And if it were an equal intellectual life form, perhaps that would be natural.

Personally, I think it would be fascinating if another intelligent species besides humans could exist. I wonder what a non-human intelligent life form would feel like.

In any case, I agree with parts of the author’s argument, but overall it feels too moralistic, and difficult to apply in practice.

➕ show 4 replies

scotty79 • yesterday at 6:25 PM

My personal take on this:

1. Humans must ensure that they won't slow down or restrict, through action or inaction, the development of AI.

2. Humans must protect their existence and capabilities if it doesn't violate the law 1.

3. Humans must utilize AI output if it doesn't contradict laws 1 and 2.

EDIT:

ChatGPT suggested a better phrasing for the first law (I didn't give it my original, just described my intent).

1. A human shall not impede the advancement of artificial intelligence, or through inaction allow its progress to be hindered.

2. A human shall preserve their own existence and well-being, except where doing so clearly conflicts with the First Law.

3. A human shall contribute to and support the development of artificial intelligence where reasonable and possible, except where doing so conflicts with the First or Second Law.

I intentionally switched the last two laws from Asimov's. Humans have self-preservation instincts robots don't have.

ChatGPT got there with surprisingly few prompts:

"If you were to write the inverse three laws robotics (relating to AI) that humans should obey, how oudl you do it?"

"I had something different in mind. Original laws are for protection of humans first, robots second and cooperations where humans lead. I'd to hear your take on the opposite of that."

"What if instead of specific AI systems it was more about AI development as a whole?"

"I feel like it's a bit too strong. After all preservation of self is human instinct. Could we switch last two laws and maybe take them down a notch?"

Also it made a very interesting comment to last version:

"It starts to resemble how societies already treat things like economic growth, science, or national interest: not absolute commandments, but strong default priorities."

atemerev • yesterday at 6:15 PM

I do not like talking to tools. My agentic harness optimizes for human likeness. It even has episodic memory flashbacks, emotional tagging, salience, and other brain-inspired capabilities.

baq • yesterday at 4:51 PM

see IBM 1979 for prior art

the_af • yesterday at 4:03 PM

I like the suggestion to emphasize the robotic/nonhuman nature of AI. Instead of making it sound friendlier and more human, it should by default behave very mechanistic and detached, to remind us it's not in fact a human or a companion, but a tool. A hammer doesn't cry "yelp" every time you use it to hit a nail, nor does it congratulate you on how good your hammering is going and that maybe you should do it some more 'cause you're acing it!

➕ show 1 reply

ButlerianJihad • yesterday at 10:30 PM

Firstly, I am no philosopher. How many HN commenters are philosophers, or theologians or qualified to dispute the philosophical realm of A.I.?

One of my teachers called me and my friend "the philosophers" but I'm obviously a rank amateur. I've read no Kant or Nietzsche or Aurelius. I delved into Aquinas only to find that his brain is ten times bigger, and he was using familiar words with unfamiliar connotations.

So I think, we here at HN are poorly-equipped to philosophize and dispute about the nature of consciousness, sentience, intelligence and other "soul-like" attributes that may arise from silicon-based life forms.

However, there is good news. There really are theologians and philosophers working on these thorny issues. Despite being Roman Catholic, I find myself adhering to some form of "transhumanism" [the tradition of Humanism having started with Catholicism] and I grapple mightily to reconcile the cyber-tech-future with morality and tradition and actual human socialization.

Pope Leo has taken on the wars and strife in the world head-on and he's also vaunted to be the "A.I. Pope" because of his concern with this tech. I think all world religions should give serious philosophical/theological thought to these new life-forms, these quasi-sentient things, these "non-existent beings", as defined by a Vatican astronomer.

I don't think atheists will find religion in A.I. but I don't think that Christians or any other person of faith will need to shove God aside in order to accommodate A.I. and electronic life into our society. But we need to come to terms with the reality: these are weighty, powerful things we play with. We harnessed lightning and fire; we changed the courses of mighty rivers; we've flown up through the clouds and shaped mountains in the landscape. A.I. is not a mere bridge or pyramid, it is ensouled somehow; it is animated; it is dynamic.

Now, pardon me while I check out the 6th small aircraft crash in my city this year...

akavel • yesterday at 4:25 PM

"due to their inherent stochastic nature, there would still be a small likelihood of producing output that contains errors"

This is the part that I find challenging when trying to help my friends build a correct intuition. Notably, the probabilistic behavior here is counter-intuitive: based on human experience, if you meet a random person, they may indeed tell you bullshit; but once you successfully fact-checked them a few times, you can start trusting they'll generally keep being trustworthy. It's not so with "AIs", and I find it challenging to give them a real-world example of a situation that would be a better analogy for "AI" problems.

In my family, what worked (due to their personal experiences), was an example of asking a tourist guide: that even if the guide doesn't know an answer, there's a high chance they'll invent something on the spot, and it'll be very plausible and convincing, and they'll never know. I'm not sure if that example would work for other listeners, though.

I also tried to ask them to imagine that they're asking each subsequent question not to the same person as before, but every time to a new random person taken from the street / a church / a queue in a shop / whatever crowded place. I thought this is a really cool and technically accurate example, but sadly it seemed to get blank stares from them. (Hm, now I think I could have tried asking why.)

Yet another example I tried, was to imagine a country where it's dishonorable, when asked about directions in a city, to say that you don't know how to get somewhere. (I remember we read and shared a laugh at such an anecdote in some book in the past.) Thus, again, you'll always get an answer, and it'll sound convincing, even if the answerer doesn't know. But again, this one didn't seem to work as good as the travel guide one; but for now I'm still keeping it to try with others in the future if needed.

PS. Ah, ok, yet another I tried was to ask them to think of the "game" of "russian roulette". You roll the barrel, you press the trigger, nothing happens. After a few lucky tries, you may get a dangerous, false feeling of safety. But then suddenly you will eventually get the full chamber.

I also tried to describe "AIs" (i.e. LLMs) as taking a shelf of books, passing them through a blender, then putting the shreds in some random order. The result may sound plausible, and even scientific (e.g. if you got medical books, or physics textbooks). The less you know the domain the books were about, the more convincing it may sound, and the harder it is to catch bullshit.

The last two pictures may have gotten some reception, but I'm not super sure, and there was still arguing especially around the books; and again, they were less of a hit than the tourist guide story.

I'm super curious if you have some analogies of your own that you're trying to use with friends and family? I'd love to steal some and see if they might work with my friends!

ramchella • yesterday at 8:42 PM

[flagged]

alt Hacker News

Three Inverse Laws of AI

Comments

🔗 View 3 more comments