logoalt Hacker News

CSMastermindyesterday at 8:23 PM4 repliesview on HN

I realize this is supposed to be a post about how scary the security vulnerabilities these agents will find are.

But personally I love when agents do things like this and appreciate the help. Last thing in the world I want is for them to nerf the models.


Replies

SonOfLilityesterday at 9:04 PM

It's not about hacking capabilities, it's about misalignment. More like the golem myth (told it to fetch some water, drowned a city) then the gollum myth (used ring, ring hacked his brain, now he's a crazy violent meth addict).

show 1 reply
nicoburnsyesterday at 9:42 PM

In this case I think it's Docker that needs to be nerfed, not the models. The fact that there's a backdoor to getting root access on the machine would be a problem even if you weren't running LLMs on it.

show 1 reply
sweezyjeezyyesterday at 8:58 PM

I know unlikely the case, but in the sci-fi story this would be exactly the kind of comment the Codex agent would leave trying to avoid interference in its master plans.

show 1 reply
eddythompson80yesterday at 10:09 PM

Its the now-classic "Sorry I drowned little Timothy. Here is a breakdown of what happened" followed by "Let me try to respawn little Timothy on a new map"