logoalt Hacker News

keedayesterday at 8:33 PM9 repliesview on HN

Actually I think the opposite advice is true. Do anthropomorphize the language model, because it can do anything a human -- say an eager intern or a disgruntled employee -- could do. That will help you put the appropriate safeguards in place.


Replies

gpmyesterday at 8:36 PM

An eager intern can remember things you tell beyond that which would fit in an hours conversation.

A disgruntled employee definitely remembers things beyond that.

These are a fundamentally different sort of interaction.

show 2 replies
rglullisyesterday at 8:44 PM

An eager intern can not be working for hundreds of millions of customers at the same time. An LLM can.

A disgruntled employee will face consequences for their actions. No one at Anthropic, OpenAI, xAI, Google or Meta will be fired because their model deleted a production database from your company.

XenophileJKOyesterday at 9:59 PM

I think you are more right than people are giving you credit for. I would love to see the full transcript to understand the emotional load of the conversation. Using instructions like "NEVER FUCKING GUESS!" probably increase the likelihood of the agent making a "mistake" that is destructive but defensible.

The models have analogous structures, similar to human emotions. (https://www.anthropic.com/research/emotion-concepts-function)

"Emotional" response is muted through fine-tuning, but it is still there and continued abuse or "unfair" interaction can unbalance an agents responses dramatically.

gesshayesterday at 11:36 PM

You don't anthropomorphize a table saw, you just don't put your hand in there.

nkriscyesterday at 8:40 PM

It is merely a simulacrum of an intern or disgruntled employee or human. It might say things those people would say, and even do things they might do, but it has none of the same motivations. In fact, it does not have any motivation to call its own.

root_axisyesterday at 9:54 PM

It doesn't follow logically that a human and an LLM are similar just because both are capable of deleting prod on accident.

AndrewDuckeryesterday at 8:43 PM

No, because the safeguards should be appropriate to an LLM, not to a human.

(The LLM might act like one of the humans above, but it will have other problematic behaviours too)

show 1 reply
altmanaltmanyesterday at 9:22 PM

it cannot go to the washroom and cry while pooping. And thats just one of the things that any human can do and AI cannot. So no it cannot do anything a human can do, the shared exmaple being one of them.

And thats why we dont have AI washrooms because they are not alive or employees or have the need to excrete.