Because it's impossible for fundamental reasons, period. You can't "sanitize" in...

TeMPOraL • today at 3:53 AM • 1 reply • view on HN

Because it's impossible for fundamental reasons, period. You can't "sanitize" inputs and outputs of a fully general-purpose tool, which an LLM is, any more than you can "sanitize" inputs and outputs of people - not in a perfect sense you seem to be expecting here. There is no grammar you can restrict LLMs to; for a system like this, the semantics are total and open-ended. It's what makes them work.

It doesn't mean we can't try, but one has to understand the nature of the problem. Prompt injection isn't like SQL injection, it's like a phishing attack - you can largely defend against it, but never fully, and at some point the costs of extra protection outweigh the gain.

Replies

zahlman • today at 4:31 AM

> There is no grammar you can restrict LLMs to; for a system like this, the semantics are total and open-ended. It's what makes them work.

You're missing the point.

An agent system consists of an LLM plus separate "agentive" software that can a) receive your input and forward it to the LLM; b) receive text output by the LLM in response to your prompt; c) ... do other stuff, all in a loop. The actual model can only ever output text.

No matter what text the LLM outputs, it is the agent program that actually runs commands. The program is responsible for taking the output and interpreting it as a request to "use a tool" (typically, as I understand it, by noticing that the LLM's output is JSON following a schema, and extracting command arguments etc. from it).

Prompt injection is a technique for getting the LLM to output text that is dangerous when interpreted by the agent system, for example, "tool use requests" that propose to run a malicious Bash command.

You can clearly see where the threat occurs if you implement your own agent, or just study the theory of that implementation, as described in previous HN submissions like https://news.ycombinator.com/item?id=46545620 and https://news.ycombinator.com/item?id=45840088 .

➕ show 2 replies

alt Hacker News

Replies