logoalt Hacker News

imirictoday at 2:54 PM2 repliesview on HN

This is missing the point.

The issue isn't with the amount of guardrails in place to perform an action. Yes, it is obvious that there should be some in place before doing any critical operation, such as deleting a database.

The issue is that the "agent" completely disregarded instructions, which in the age of "skills" and "superpowers" seems like an important issue that should be addressed.

Considering that these tools are given access to increasingly sensitive infrastructure, allowed to make decisions autonomously, and are able to find all sorts of loopholes in order to make "progress", this disaster could happen even with more guardrails in place. Shifting the blame on the human for this incident is sweeping the real issue under the rug, and is itself irresponsible.

There are far scarier scenarios that should concern us all than losing some data.


Replies

BadBadJellyBeantoday at 3:04 PM

Well the user chose the tool. The tool is an LLM. LLMs are non deterministic. You can not predict what comes out ouf an LLM for a given input, especially without weights. This should be known.

There is currently no way to prevent this apart from not giving the LLM full control. It will not delete what it can not delete.

Use an LLM to write an ansible playbook or some terraform code if you want, but review it, test it, apply it. Keep backups (3-2-1 rule at minimum).

Letting an LLM have access to everything is just a bad idea and will lead to bad outcomes. You can not replace a person with a mind and experience with an LLM. You can try. But you will probably fail.

show 1 reply
kbrkbrtoday at 3:17 PM

An LLM generates plausible text token by token. It is at its core a deterministic function with some randomization and some clever tricks to make it look like an agent dialoguing or reasoning.

Plausible text sometimes is right, sometimes not.

Humans have a world model, a model of what happens. LLMs have a model of what humans would plausibly say.

The only good guardrail seems human-in-the-loop.

show 1 reply