logoalt Hacker News

gigatreeyesterday at 8:48 PM4 repliesview on HN

He’s not necessarily anthropomorphizing it, he’s showing that it went against every instruction he gave it. Sure concepts like “confession” technically require a conscious mind, but I think at this point we all know what someone means when they use them to describe LLM behavior (see also “think”, “say”, “lie” etc)


Replies

getpokedagainyesterday at 8:52 PM

We are anthropomorphizing whenever we refer to prompts as instructions to models. They predict text not obey our orders.

show 2 replies
Terr_yesterday at 10:31 PM

> He’s not necessarily anthropomorphizing it, he’s showing that it went against every instruction he gave it.

It's deeper than that, there are two pitfalls here which are not simply poetic license.

1. When you submit the text "Why did you do that?", what you want is for it to reveal hidden internal data that was causal in the past event. It can't do that, what you'll get instead is plausible text that "fits" at the end of the current document.

2. The idea that one can "talk to" the LLM is already anthropomorphizing on a level which isn't OK for this use-case: The LLM is a document-make-bigger machine. It's not the fictional character we perceive as we read the generated documents, not even if they have the same trademarked name. Your text is not a plea to the algorithm, your text is an in-fiction plea from one character to another.

_________________

P.S.: To illustrate, imagine there's this back-and-forth iterative document-growing with an LLM, where I supply text and then hit the "generate more" button:

1. [Supplied] You are Count Dracula. You are in amicable conversation with a human. You are thirsty and there is another delicious human target nearby, as well as a cow. Dracula decides to

2. [Generated] pounce upon the cow and suck it dry.

3. [Supplied] The human asks: "Dude why u choose cow LOL?" and Dracula replies:

4. [Generated] "I confess: I simply prefer the blood of virgins."

What significance does that #4 "confession" have?

Does it reveal a "fact" about the fictional world that was true all along? Does it reveal something about "Dracula's mind" at the moment of step #2? Neither, it's just generating a plausible add-on to the document. At best, we've learned something about a literary archetype that exists as statistics in the training data.

show 2 replies
pessimizeryesterday at 10:02 PM

> he’s showing that it went against every instruction he gave it.

How exactly is he doing that? By making the LLM say it? Just because an LLM says something doesn't mean anything has been shown.

The "confession" is unrelated to the act, the model has no particular insight into itself or what it did. He knows that the thing went against his instructions because he remembers what those instructions were and he saw what the thing did. Its "postmortem" is irrelevant.

hn_throwaway_99yesterday at 9:21 PM

The entire post looks like an exercise in CYA. To be fair, I have a ton of sympathy for the author, but I think his response totally misses the point. In my mind he is anthropomorphizing the agent in the sense of "I treated you like a human coworker, and if you were a human coworker I'd be pissed as hell at you for not following instructions and for doing something so destructive."

I would feel a lot differently if instead he posted a list of lessons learned and root cause analyses, not just "look at all these other companies who failed us."