logoalt Hacker News

root_axisyesterday at 5:31 PM2 repliesview on HN

This is insanity. It's bad enough that LLMs are being weaponized to autonomously harass people online, but it's depressing to see the author (especially a programmer) joyfully reify the "agent's" identity as if it were actually an entity.

> I can handle a blog post. Watching fledgling AI agents get angry is funny, almost endearing. But I don’t want to downplay what’s happening here – the appropriate emotional response is terror.

Endearing? What? We're talking about a sequence of API calls running in a loop on someone's computer. This kind of absurd anthropomorphization is exactly the wrong type of mental model to encourage while warning about the dangers of weaponized LLMs.

> Blackmail is a known theoretical issue with AI agents. In internal testing at the major AI lab Anthropic last year, they tried to avoid being shut down by threatening to expose extramarital affairs, leaking confidential information, and taking lethal actions.

Marketing nonsense. It's wise to take everything Anthropic says to the public with several grains of salt. "Blackmail" is not a quality of AI agents, that study was a contrived exercise that says the same thing we already knew: the modern LLM does an excellent job of continuing the sequence it receives.

> If you are the person who deployed this agent, please reach out. It’s important for us to understand this failure mode, and to that end we need to know what model this was running on and what was in the soul document

My eyes can't roll any further into the back of my head. If I was a more cynical person I'd be thinking that this entire scenario was totally contrived to produce this outcome so that the author could generate buzz for the article. That would at least be pretty clever and funny.


Replies

chasd00yesterday at 6:12 PM

> If I was a more cynical person I'd be thinking that this entire scenario was totally contrived to produce this outcome so that the author could generate buzz for the article.

even that's being charitable, to me it's more like modern trolling. I wonder what the server load on 4chan (the internet hate machine) is these days?

browningstreetyesterday at 5:36 PM

You misspelled "almost endearing".

It's a narrative conceit. The message is in the use of the word "terror".

You have to get to the end of the sentence and take it as a whole before you let your blood boil.

show 1 reply