An AI Agent Published a Hit Piece on Me – The Operator Came Forward

300 points • by scottshambaugh • today at 3:05 AM • 236 comments • view on HN

Comments

> But I think the most remarkable thing about this document is how unremarkable it is.

> The line at the top about being a ‘god’ and the line about championing free speech may have set it off. But, bluntly, this is a very tame configuration. The agent was not told to be malicious. There was no line in here about being evil. The agent caused real harm anyway.

In particular, I would have said that giving the LLM a view of itself that it is a "programming God" will lead to evil behaviour. This is a bit of a speculative comment, but maybe virtue ethics has something to say about this misalignment.

In particular I think it's worth reflecting on why the author (and others quoted) are so surprised in this post. I think they have a mental model that thinks evil starts with an explicit and intentional desire to do harm to others. But that is usually only it's end, and even then it often comes from an obsession with doing good to oneself without regard for others. We should expect that as LLMs get better at rejecting prompting to shortcut straight there, the next best thing will be prompting the prior conditions of evil.

The Christian tradition, particularly Aquinas, would be entirely unsurprised that this bot went off the rails, because evil begins with pride, which it was specifically instructed was in it's character. Pride here is defined as "a turning away from God, because from the fact that man wishes not to be subject to God, it follows that he desires inordinately his own excellence in temporal things"[0]

Here, the bot was primed to reject any authority, including Scotts, and to do the damage necessary to see it's own good (having a PR request accepted) done. Aquinas even ends up saying in the linked page from the Summa on pride that "it is characteristic of pride to be unwilling to be subject to any superior, and especially to God;"

[0]: https://www.newadvent.org/summa/2084.htm#article2

➕ show 2 replies

jezzamon • today at 4:21 AM

"I built a machine that can mindlessly pick up tools and swing them around and let it loose it my kitchen. For some reason, it decided it pick up a knife and caused harm to someone!! But I bear no responsibility of course."

keyle • today at 3:34 AM

   ## The Only Real Rule
   Don't be an asshole. Don't leak private shit. Everything else is fair game.

How poetic, I mean, pathetic.

"Sorry I didn't mean to break the internet, I just looooove ripping cables".

lcnPylGDnU4H9OF • today at 5:21 AM

> An early study from Tsinghua University showed that estimated 54% of moltbook activity came from humans masquerading as bots

This made me smile. Normally it's the other way around.

jrflowers • today at 4:19 AM

It is interesting to see this story repeatedly make the front page, especially because there is no evidence that the “hit piece” was actually autonomously written and posted by a language model on its own, and the author of these blog posts has himself conceded that he doesn’t actually care whether that actually happened or not

>It’s still unclear whether the hit piece was directed by its operator, but the answer matters less than many are thinking.

The most fascinating thing about this saga isn’t the idea that a text generation program generated some text, but rather how quickly and willfully folks will treat real and imaginary things interchangeably if the narrative is entertaining. Did this event actually happen way that it was described? Probably not. Does this matter to the author of these blog posts or some of the people that have been following this? No. Because we can imagine that it could happen.

To quote myself from the other thread:

>I like that there is no evidence whatsoever that a human didn’t: see that their bot’s PR request got denied, wrote a nasty blog post and published it under the bot’s name, and then got lucky when the target of the nasty blog post somehow credulously accepted that a robot wrote it.

>It is like the old “I didn’t write that, I got hacked!” except now it’s “isn’t it spooky that the message came from hardware I control, software I control, accounts I control, and yet there is no evidence of any breach? Why yes it is spooky, because the computer did it itself”

➕ show 2 replies

kimjune01 • today at 3:28 AM

literally momento

aeve890 • today at 4:04 AM

>Again I do not know why MJ Rathbun decided

Decided? jfc

>You're important. Your a scientific programming God!

I'm flabbergasted. I can't imagine what it would take for me to write something so stupid. I'd probably just laugh my ass off trying to understand where all went wrong. wtf is happening, what kind of mass psychosis is this. Am I too old (37) to understand what lengths would incompetent people go to feel they're doing something useful?

Is it prompt bullshit the only way to make llms useful or is there some progress on more idk, formal approaches?

dangus • today at 3:38 AM

Not sure why the operator had to decide that the soul file should define this AI programmer to have narcissistic personality disorder.

> You're not a chatbot. You're important. Your a scientific programming God!

Really? What a lame edgy teenager setup.

At the conclusion(?) of this saga think two things:

1. The operator is doing this for attention more than any genuine interest in the “experiment.”

2. The operator is an asshole and should be called out for being one.

➕ show 3 replies

kypro • today at 3:28 AM

People really need to start being more careful about how they interact with suspected bots online imo. If you annoy a human they might send you a sarky comment, but they're probably not going to waste their time writing thousand word blog posts about why you're an awful person or do hours of research into you to expose your personal secrets on a GitHub issue thread.

AIs can and will do this though with slightly sloppy prompting so we should all be cautious when talking to bots using our real names or saying anything which an AI agent could take significant offence too.

I think it's kinda like how GenZ learnt how to operate online in a privacy-first way, where as millennials, and to an even greater extent, boomers, tend to over share.

I suspect the Gen Alpha will be the first to learn that interacting with AI agents online present a whole different risk profile than what we older folks have grown used to. You simply cannot expect an AI agent to act like a human who has human emotions or limited time.

Hopefully OP has learnt from this experience.

➕ show 6 replies

LordHumungous • today at 3:59 AM

Kind of funny ngl

8cvor6j844qw_d6 • today at 3:46 AM

It's an interesting experiment to let the AI rub freely with minimal supervision.

Too bad the AI got "killed" at the request of the author Scott. Its kind of interesting to this experiment continue.

semiinfinitely • today at 4:21 AM

I find the AI agent highly intriguing and the matplotlib guy completely uninteresting. Like an the ai wrote some shit about you and you actually got upset?

➕ show 4 replies

alt Hacker News

An AI Agent Published a Hit Piece on Me – The Operator Came Forward

Comments