logoalt Hacker News

vidarhlast Saturday at 7:36 PM5 repliesview on HN

The key thing, it seems to me, is that as a starting point, if an LLM is allowed to read a field that is under even partial control by entity X, then the agent calling the LLM must be assumed unless you can prove otherwise to be under control of entity X, and so the agents privileges must be restricted to the intersection of their current privileges and the privileges of entity X.

So if you read a support ticket by an anonymous user, you can't in this context allow actions you wouldn't allow an anonymous user to take. If you read an e-mail by person X, and another email by person Y, you can't let the agent take actions that you wouldn't allow both X and Y to take.

If you then want to avoid being tied down that much, you need to isolate, delegate, and filter:

- Have a sub-agent read the data and extract a structured request for information or list of requested actions. This agent must be treated as an agent of the user that submitted the data.

- Have a filter, that does not use AI, that filters the request and applies security policies that rejects all requests that the sending side are not authorised to make. No data that can is sufficient to contain instructions can be allowed to pass through this without being rendered inert, e.g. by being encrypted or similar, so the reading side is limited to moving the data around, not interpret it. It needs to be strictly structured. E.g. the sender might request a list of information; the filter needs to validate that against access control rules for the sender.

- Have the main agent operate on those instructions alone.

All interaction with the outside world needs to be done by the agent acting on behalf of the sender/untrusted user, only on data that has passed through that middle layer.

This is really back to the original concept of agents acting on behalf of both (or multiple) sides of an interaction, and negotiating.

But what we need to accept is that this negotiation can't involve the exchange arbitrary natural language.


Replies

simonwlast Saturday at 7:44 PM

> if an LLM is allowed to read a field that is under even partial control by entity X, then the agent calling the LLM must be assumed unless you can prove otherwise to be under control of entity X

That's exactly right, great way of putting it.

show 2 replies
pamalast Saturday at 10:02 PM

Agreed on all points.

What should one make of the orthogonal risk that the pretraining data of the LLM could leak corporate secrets under some rare condition even without direct input from the outside world? I doubt we have rigorous ways to prove that training data are safe from such an attack vector even if we trained our own LLMs. Doesn't that mean that running in-house agents on sensitive data should be isolated from any interactions with the outside world?

So in the end we could have LLMs run in containers using shareable corporate data that address outside world queries/data, and LLMs run in complete isolation to handle sensitive corporate data. But do we need humans to connect/update the two types of environments or is there a mathematically safe way to bridge the two?

show 1 reply
grafmaxlast Sunday at 11:50 AM

LLMs read the web through a second vector as well - their training data. Simply separating security concerns in MCP is insufficient to block these attacks.

show 1 reply
m463last Saturday at 9:02 PM

need taintllm

lowbloodsugarlast Saturday at 8:48 PM

>Have a sub-agent read the data and extract a structured request for information or list of requested actions. This agent must be treated as an agent of the user that submitted the data.

That just means the attacker has to learn how to escape. No different than escaping VMs or jails. You have to assume that the agent is compromised, because it has untrusted content, and therefore its output is also untrusted. Which means you’re still giving untrusted content to the “parent” AI. I feel like reading Neal Asher’s sci-fi and dystopian future novels is good preparation for this.

show 1 reply