logoalt Hacker News

ameliusyesterday at 7:26 PM4 repliesview on HN

> The sequence of tokens that would destroy your production environment can be produced by your agent, no matter how much prompting you use.

Yes, but if the probability is much smaller than, say, being hit by a meteorite, then engineers usually say that that's ok. See also hash collisions.


Replies

maxbondyesterday at 7:34 PM

If you have taken measures to ensure that the probability is that low, yes, that is an example of a strong engineering control. You don't make a hash by just twiddling bits around and hoping for the best, you have to analyze the algorithm and prove what the chance of a collision really is.

How do you drive the probability of some series of tokens down to some known, acceptable threshold? That's a $100B question. But even if you could - can you actually enumerate every failure mode and ensure all of them are protected? If you can, I suspect your problem space is so well specified that you don't need an AI agent in the first place. We use agents to automate tasks where there is significant ambiguity or the need for a judgment call, and you can't anticipate every disaster under those circumstances.

lukasgelbmannyesterday at 7:36 PM

If you’re using a model, it’s your responsibility to make sure the probability actually is that small. Realistically, you do that by not giving the model access to any of your bloody prod API keys.

drob518yesterday at 7:47 PM

How do you know what the probability is?

show 3 replies
hunterpayneyesterday at 9:47 PM

"Yes, but if the probability is much smaller than, say, being hit by a meteorite, then engineers usually say that that's ok"

Yet in this case, that probability clearly isn't smaller than a meteorite strike.