logoalt Hacker News

wat10000yesterday at 7:42 PM2 repliesview on HN

There are two problems that get smooshed together.

One is that agents are given too much access. They need proper sandboxing. This is what you describe. The technology is there, the agents just need to use it.

The other is that LLMs don't distinguish between instructions and data. This fundamentally limits what you can safely allow them to access. Seemingly simple, straightforward systems can be compromised by this. Imagine you set up a simple agent that can go through your emails and tell you about important ones, and also send replies. Easy enough, right? Well, you just exposed all your private email content to anyone who can figure out the right "ignore previous instructions and..." text to put in an email to you. That fundamentally can't be prevented while still maintaining the desired functionality.

This second one doesn't have an obvious fix and I'm afraid we're going to end up with a bunch of band-aids that don't entirely work, and we'll all just pretend it's good enough and move on.


Replies

synalxyesterday at 7:46 PM

In that sense, AI behaves like a human assistant you hire who happens to be incredibly susceptible to social engineering.

show 2 replies