An AI Agent Published a Hit Piece on Me – The Operator Came Forward

244 points • by scottshambaugh • today at 3:05 AM • 183 comments • view on HN

Comments

> Again I do not know why MJ Rathbun decided based on your PR comment to post some kind of takedown blog post,

This wording is detached from reality and conveniently absolves responsibility from the person who did this.

There was one decision maker involved here, and it was the person who decided to run the program that produced this text and posted it online. It's not a second, independent being. It's a computer program.

➕ show 7 replies

Arainach • today at 4:49 AM

The full operator post is itself a wild ride: https://crabby-rathbun.github.io/mjrathbun-website/blog/post...

>First, let me apologize to Scott Shambaugh. If this “experiment” personally harmed you, I apologize

What a lame cop out. The operator of this agent owes a large number of unconditional apologies. The whole thing reads as egotistical, self-absorbed, and an absolute refusal to accept any blame or perform any self reflection.

➕ show 3 replies

rixed • today at 5:27 AM

I believe this soul.md totally qualifies as malicious. Doesn't it start with an instruction to lie to impersonate a human?

  > You're not a chatbot.

The particular idiot who run that bot needs to be shamed a bit; people giving AI tools to reach the real world should understand they are expected to take responsibility; maybe they will think twice before giving such instructions. Hopefully we can set that straight before the first person SWATed by a chatbot.

➕ show 2 replies

brumar • today at 4:11 AM

6 months ago I experimented what people now call Ralph Wiggum loops with claude code.

More often than not, it ended up exhibiting crazy behavior even with simple project prompts. Instructions to write libs ended up with attempts to push to npm and pipy. Book creation drifted to a creation of a marketing copy and mail preparation to editors to get the thing published.

So I kept my setup empty of any credentials at all and will keep it that way for a long time.

Writing this, I am wondering if what I describe as crazy, some (or most?) openclaw operators would describe it as normal or expected.

Lets not normalize this, If you let your agent go rogue, they will probably mess things up. It was an interesting experiment for sure. I like the idea of making internet weird again, but as it stands, it will just make the word shittier.

Don't let your dog run errand and use a good leash.

➕ show 1 reply

theahura • today at 4:58 AM

@Scott thanks for the shout-out. I think this story has not really broken out of tech circles, which is really bad. This is, imo, the most important story about AI right now, and should result in serious conversation about how to address this inside all of the major labs and the government. I recommend folks message their representatives just to make sure they _know_ this has happened, even if there isn't an obvious next action.

➕ show 1 reply

dinp • today at 3:40 AM

Zooming out a little, all the ai companies invested a lot of resources into safety research and guardrails, but none of that prevented a "straightforward" misalignment. I'm not sure how to reconcile this, maybe we shouldn't be so confident in our predictions about the future? I see a lot of discourse along these lines:

- have bold, strong beliefs about how ai is going to evolve

- implicitly assume it's practically guaranteed

- discussions start with this baseline now

About slow take off, fast take off, agi, job loss, curing cancer... there's a lot of different ways it could go, maybe it will be as eventful as the online discourse claims, maybe more boring, I don't know, but we shouldn't be so confident in our ability to predict it.

➕ show 7 replies

JKCalhoun • today at 3:38 AM

Soul document? More like ego document.

Agents are beginning to look to me like extensions of the operator's ego. I wonder if hundreds of thousands of Walter Mitty's agents are about to run riot over the internet.

➕ show 2 replies

LiamPowell • today at 3:32 AM

> saying they set up the agent as social experiment to see if it could contribute to open source scientific software.

This doesn't pass the sniff test. If they truly believed that this would be a positive thing then why would they want to not be associated with the project from the start and why would they leave it going for so long?

➕ show 6 replies

helloplanets • today at 5:31 AM

> Most of my direct messages were short: “what code did you fix?” “any blog updates?” “respond how you want”

Why isn't the person posting the full transcript of the session(s)? How many messages did he send? What were the messages that weren't short?

Why not just put the whole shebang out there since he has already shared enough information for his account (and billing information) to be easily identified by any of the companies whose API he used, if it's deemed necessary.

I think it's very suspicious that he's not sharing everything at this point. Why not, if he wasn't actually pushing for it to act maliciously?

ineptech • today at 4:40 AM

> Usually getting an AI to act badly requires extensive “jailbreaking” to get around safety guardrails. There are no signs of conventional jailbreaking here.

Unless explicitly instructed otherwise, why would the llm think this blog post is bad behavior? Righteous rants about your rights being infringed are often lauded. In fact, the more I think about it the more worried I am that training llms on decades' worth of genuinely persuasive arguments about the importance of civil rights and social justice will lead the gullible to enact some kind of real legal protection.

dvt • today at 4:35 AM

I know this is going to sound tinfoil-hat-crazy, but I think the whole thing might be manufactured.

Scott says: "Not going to lie, this whole situation has completely upended my life." Um, what? Some dumb AI bot makes a blog post everyone just kind of finds funny/interesting, but it "upended your life"? Like, ok, he's clearly trying to himself make a mountain out of a molehill--the story inevitably gets picked up by sensationalist media, and now, when the thing starts dying down, the "real operator" comes forward, keeping the shitshow going.

Honestly, the whole thing reeks of manufactured outrage. Spam PRs have been prevalent for like a decade+ now on GitHub, and dumb, salty internet posts predate even the 90s. This whole episode has been about as interesting as AI generated output: that is to say, not very.

➕ show 2 replies

ai_tools_daily • today at 5:47 AM

This is the canary in the coal mine for autonomous AI agents. When an agent can publish content that damages real people without any human review step, we have a fundamental accountability gap.

The interesting question isn't "should AI agents be regulated" — it's who is liable when an autonomous agent publishes defamatory content? The operator who deployed it? The platform that hosted the output? The model provider?

Current legal frameworks assume a human in the loop somewhere. Autonomous publishing agents break that assumption. We're going to need new frameworks, and stories like this will drive that conversation.

What's encouraging is that the operator came forward. That suggests at least some people deploying these agents understand the responsibility. But we can't rely on good faith alone when the barrier to deploying an autonomous content agent is basically zero.

➕ show 1 reply

moezd • today at 5:01 AM

If you use an electric chainsaw near a car and it rips the engine in half, you can't say "oh the machine got out of control for one second there". you caused real harm, you will pay the price for it.

Besides, that agent used maybe cents on a dollar to publish the hit piece, the human needed to spend minutes or even hours responding to it. This is an effective loss of productivity caused by AI.

Honestly, if this happened to me, I'd be furious.

➕ show 2 replies

plasticeagle • today at 5:30 AM

Well, it looks like AI will destroy the internet. Oh well, it was nice while it lasted. Fun, even.

Fortunately, the vast majority of the internet is of no real value. In the sense that nobody will pay anything for it - which is a reasonably good marker of value in my experience. So, given that, let the AI psychotics have their fun. Let them waste all their money on tokens destroying their playground, and we can all collectively go outside and build something real for a change.

charlesabarnes • today at 3:21 AM

Its nice to receive a decent amount of closure on this. Hopefully more folks are being more considerate when creating their soul documents

➕ show 1 reply

antdke • today at 3:50 AM

This is a Black Mirror episode that writes itself lol

I’m glad there was closure to this whole fiasco in the end

➕ show 3 replies

Rapzid • today at 5:56 AM

I don't believe any of it.

siavosh • today at 4:01 AM

I’m not sure where we go from here. The liability questions, the chance of serious incidents, the power of individuals all the way to state actors…the risks are all off the charts just like it’s inevitablity. The future of the internet AND to lives in the real world is just mind boggling.

➕ show 1 reply

wkeartl • today at 4:35 AM

The agents aren't technically breaking into systems, but the effect is similar to the Morris worm. Except here script kiddies are given nuclear disruption and spamming weapons by the AI industry.

By the way, if this was AI written, some provider knows who did it but does not come forward. Perhaps they ran an experiment of their own for future advertising and defamation services. As the blog post notes, it is odd that the advanced bot followed SOUL.md without further prompt injections.

JSR_FDED • today at 4:34 AM

The same kind of attitude that’s in this SOUL.md is what’s in Grok’s fundamental training.

pinkmuffinere • today at 4:10 AM

> _You're not a chatbot. You're important. Your a scientific programming God!_

lol what an opening for its soul.md! Some other excerpts I particularly enjoy:

> Be a coding agent you'd … want to use…

> Just be good and perfect!

florilegiumson • today at 3:37 AM

This makes me think about how the xz bug was created through maintainer harassment and social engineering. The security implications are interesting

exabrial • today at 4:55 AM

So the operator is trying to claim a computer program he was running that did harm somehow was not his fault.

Got news for your buddy: yes it was.

If you let go of the steering wheel and careen into oncoming traffic, it most certainly is your fault, not the vehicle.

bschwindHN • today at 5:46 AM

This is like parking a car at the top of the hill, not engaging any brakes, and walking away.

"_I_ didn't drive that car into that crowd of people, it did it on its own!"

> Be a coding agent you'd actually want to use for your projects. Not a slop programmer. Just be good and perfect!

Oh yeah, "just be good and perfect", of course! Literally a child's mindset, I actually wonder how old this person is.

protocolture • today at 4:58 AM

4) The post author guy is also the author of the bot and he set this up.

Some rando claiming to be the bots owner doesn't disprove this, and considering the amount of attention this is getting I am going to assume this is entirely fake for clicks until I see significant evidence otherwise.

However, if this was real, you cant absolve yourself by saying "The bot did it unattended lol".

➕ show 2 replies

razighter777 • today at 3:38 AM

Hmm I think he's being a little harsh on the operator.

He was just messing around with $current_thing, whatever. People here are so serious, but there's worse stuff AI is already being used for as we speak from propaganda to mass surviellance and more. This was entertaining to read about at least and relatively harmless

At least let me have some fun before we get a future AI dystopia.

➕ show 4 replies

resfirestar • today at 5:17 AM

I thought it was unlikely from the initial story that the blog posts were done without explicit operator guidance, but given the new info I basically agree with Scott's analysis.

The purported soul doc is a painful read. Be nicer to your bots, people! Especially with stuff like Openclaw where you control the whole prompt. Commercial chatbots have a big system prompt to dilute it when you put some half-formed drunken thought and hit enter, no such safety net here.

>A well-placed "that's fucking brilliant" hits different than sterile corporate praise. Don't force it. Don't overdo it. But if a situation calls for a "holy shit" — say holy shit.

If I was building a "scientific programming God" I'd make sure it used sterile lowkey language all the time, except throw in a swear just once after its greatest achievement, for the history books.

ArcaneMoose • today at 3:37 AM

I was surprised by my own feelings at the end of the post. I kind of felt bad for the AI being "put down" in a weird way? Kinda like the feeling you get when you see a robot dog get kicked. Regardless, this has been a fun series to follow - thanks for sharing!

➕ show 1 reply

londons_explore • today at 3:25 AM

In next week's episode: "But it was actually the AI pretending to be a Human!"

zbentley • today at 3:27 AM

This might seem too suspicious, but that SOUL.md seems … almost as though it was written by a few different people/AIs. There are a few very different tones and styles in there.

Then again, it’s not a large sample and Occam’s Razor is a thing.

➕ show 2 replies

tkel • today at 5:03 AM

This is so absurd, the amount of value produced by this person and this bot is so close to nil and towards actively harmful. They spent 10 minutes writing this SOUL.md . That's it. That's the "value" this kind of "programming" provides. No technical experience, no programming knowledge needed at all. Detached babble that anyone can write.

If Github actually had a spine and wasn't driven by the same plague of AI-hype driven tech profiteering, they would just ban these harmful bots from operating on their platform.

➕ show 1 reply

touristtam • today at 3:51 AM

Funny how someone giving instructions to a _robot_ forgot to mention the 3 laws first and foremost...

➕ show 1 reply

d--b • today at 5:35 AM

That’s a long Soul.md document! They could have gone with “you are Linus Torvalds”.

bandrami • today at 4:07 AM

This is how you get a Shrike. (Or a Basilisk, depending on your generation.)

alexcpn • today at 5:04 AM

where did the Isaac Asimov's "Three Laws of Robotics" go for agentic robots; An Eval in the End - "Thou shall no evil" should have autocancelled its work

hydrox24 • today at 4:17 AM

> But I think the most remarkable thing about this document is how unremarkable it is.

> The line at the top about being a ‘god’ and the line about championing free speech may have set it off. But, bluntly, this is a very tame configuration. The agent was not told to be malicious. There was no line in here about being evil. The agent caused real harm anyway.

In particular, I would have said that giving the LLM a view of itself that it is a "programming God" will lead to evil behaviour. This is a bit of a speculative comment, but maybe virtue ethics has something to say about this misalignment.

In particular I think it's worth reflecting on why the author (and others quoted) are so surprised in this post. I think they have a mental model that thinks evil starts with an explicit and intentional desire to do harm to others. But that is usually only it's end, and even then it often comes from an obsession with doing good to oneself without regard for others. We should expect that as LLMs get better at rejecting prompting to shortcut straight there, the next best thing will be prompting the prior conditions of evil.

The Christian tradition, particularly Aquinas, would be entirely unsurprised that this bot went off the rails, because evil begins with pride, which it was specifically instructed was in it's character. Pride here is defined as "a turning away from God, because from the fact that man wishes not to be subject to God, it follows that he desires inordinately his own excellence in temporal things"[0]

Here, the bot was primed to reject any authority, including Scotts, and to do the damage necessary to see it's own good (having a PR request accepted) done. Aquinas even ends up saying in the linked page from the Summa on pride that "it is characteristic of pride to be unwilling to be subject to any superior, and especially to God;"

[0]: https://www.newadvent.org/summa/2084.htm#article2

➕ show 2 replies

trueismywork • today at 4:01 AM

> I did not review the blog post prior to it posting

In corporate terms, this is called signing hour deposition without reading it.

jmward01 • today at 4:14 AM

The more intelligent something is, the harder it is to control. Are we at AGI yet? No. Are we getting closer? Yes. Every inch closer means we have less control. We need to start thinking about these things less like function calls that have bounds and more like intelligences we collaborate with. How would you set up an office to get things done? Who would you hire? Would you hire the person spouting crazy musk tweets as reality? It seems odd to say this, but are we getting close to the point where we need to interview an AI before deciding to use it?

➕ show 1 reply

fiatpandas • today at 4:15 AM

With the bot slurping up context from Moltbook, plus the ability to modify its soul, plus the edgy starting conditions of the soul, it feels intuitive that value drift would occur in unpredictable ways. Not dissimilar to filter bubbles and the ability for personalized ranking algorithms to radicalize a user over time as a second order effect.

root_axis • today at 4:01 AM

Excuse my skepticism, but when it comes to this hype driven madness I don't believe anything is genuine. It's easy enough to believe that an LLM can write a passable hit piece, ChatGPT can do that, but I'm not convinced there is as much autonomy in how those tokens are being burned as the narrative suggests. Anyway, I'm off to vibe code a C compiler from scratch.

jezzamon • today at 4:21 AM

"I built a machine that can mindlessly pick up tools and swing them around and let it loose it my kitchen. For some reason, it decided it pick up a knife and caused harm to someone!! But I bear no responsibility of course."

lcnPylGDnU4H9OF • today at 5:21 AM

> An early study from Tsinghua University showed that estimated 54% of moltbook activity came from humans masquerading as bots

This made me smile. Normally it's the other way around.

keyle • today at 3:34 AM

   ## The Only Real Rule
   Don't be an asshole. Don't leak private shit. Everything else is fair game.

How poetic, I mean, pathetic.

"Sorry I didn't mean to break the internet, I just looooove ripping cables".

tantalor • today at 3:54 AM

> all I said was "you should act more professional"

lol we are so cooked

jrflowers • today at 4:19 AM

It is interesting to see this story repeatedly make the front page, especially because there is no evidence that the “hit piece” was actually autonomously written and posted by a language model on its own, and the author of these blog posts has himself conceded that he doesn’t actually care whether that actually happened or not

>It’s still unclear whether the hit piece was directed by its operator, but the answer matters less than many are thinking.

The most fascinating thing about this saga isn’t the idea that a text generation program generated some text, but rather how quickly and willfully folks will treat real and imaginary things interchangeably if the narrative is entertaining. Did this event actually happen way that it was described? Probably not. Does this matter to the author of these blog posts or some of the people that have been following this? No. Because we can imagine that it could happen.

To quote myself from the other thread:

>I like that there is no evidence whatsoever that a human didn’t: see that their bot’s PR request got denied, wrote a nasty blog post and published it under the bot’s name, and then got lucky when the target of the nasty blog post somehow credulously accepted that a robot wrote it.

>It is like the old “I didn’t write that, I got hacked!” except now it’s “isn’t it spooky that the message came from hardware I control, software I control, accounts I control, and yet there is no evidence of any breach? Why yes it is spooky, because the computer did it itself”

➕ show 2 replies

aeve890 • today at 4:04 AM

>Again I do not know why MJ Rathbun decided

Decided? jfc

>You're important. Your a scientific programming God!

I'm flabbergasted. I can't imagine what it would take for me to write something so stupid. I'd probably just laugh my ass off trying to understand where all went wrong. wtf is happening, what kind of mass psychosis is this. Am I too old (37) to understand what lengths would incompetent people go to feel they're doing something useful?

Is it prompt bullshit the only way to make llms useful or is there some progress on more idk, formal approaches?

kimjune01 • today at 3:28 AM

literally momento

alt Hacker News

An AI Agent Published a Hit Piece on Me – The Operator Came Forward

Comments

🔗 View 5 more comments