These prompt injection vulnerabilities give me the heebie jeebies. LLMs feel so non deterministic th...

ronbenton • last Thursday at 6:57 PM • 13 replies • view on HN

These prompt injection vulnerabilities give me the heebie jeebies. LLMs feel so non deterministic that it appears to me to be really hard to guard against. Can someone with experience in the area tell me if I'm off base?

Replies

throwmeaway820 • last Thursday at 7:02 PM

> it appears to me to be really hard to guard against

I don't want to sound glib, but one could simply not let an LLM execute arbitrary code without reviewing it first, or only let it execute code inside an isolated environment designed to run untrusted code

the idea of letting an LLM execute code it's dreamt up, with no oversight, in an environment you care about, is absolutely bananas to me

➕ show 3 replies

ACCount37 • last Thursday at 7:46 PM

LLMs are vulnerable in the same way humans are vulnerable. We found a way to automate PEBKAC.

I expect that agent LLMs are going to get more and more hardened against prompt injection attacks, but it's hard to get the chance of them working all the way down to zero while still having a useful LLM. So the "solution" is to limit AI privileges and avoid the "lethal trifecta".

mystifyingpoi • last Thursday at 7:04 PM

Determinism is one thing, but the more pressing thing is permission boundaries. All these AI agent tools need to come with no permissions at all out of the box, and everything should be granularly granted. But that would break all the cool demos and marketing pitches.

Allowing agent to run wild with any arbitrary shell commands is just plain stupid. This should never happen to begin with.

➕ show 2 replies

roywiggins • last Thursday at 7:38 PM

The problem isn't non-determinism per se, an agent that reliably obeys a prompt injection in a README file is behaving entirely deterministically: its behavior is totally determined by the inputs.

stingraycharles • last Thursday at 7:49 PM

You're correct, but the answer is that - typically - they don't access untrusted content all that often.

The number of scenarios in which you have your coding agent retrieving random websites from the internet is very low.

What typically happens is that they use a provider's "web search" API if they need external content, which already pre-processes and summarises all content, so these types of attacks are impossible.

Don't forget: this attack relies on injecting a malicious prompt into a project's README.md that you're actively working on.

➕ show 2 replies

anonymars • last Thursday at 7:42 PM

Maybe I can assign it my anti-phishing training

inetknght • last Thursday at 7:03 PM

> Can someone with experience in the area tell me if I'm off base?

Nope, not at all. Non-determinism is what most software developers write. Something to do with profitability and time or something.

_trampeltier • last Thursday at 7:33 PM

At least the malware does already run on the coders machine. Fun starts, when malware just start to run on users machine and the coders are not coders anymore, just prompters and have no idea how such a thing can happen.

➕ show 1 reply

resfirestar • last Thursday at 7:27 PM

If someone can write instructions to download a malicious script into an codebase, hoping an AI agent will read and follow them, they could just as easily write the same wget command directly into a build script or the source itself (probably more effective). In that way it's a very similar threat to the supply chain attacks we're hopefully already familiar with. So it is a serious issue but not necessarily one we don't know how to deal with. The solutions (auditing all third party code, isolating dev environments) just happen to be hard in practice.

➕ show 2 replies

ezst • last Thursday at 10:45 PM

Just to be the pedant here, LLMs are fully deterministic (the same LLM, in the same state, with the same inputs, will deliver the same output, and you can totally verify that by running a LLM locally). It's just that they are chaotic (a prompt and a second with slight and seemingly minor changes can produce not just different but conflictual outputs).

➕ show 2 replies

fenwick67 • yesterday at 1:42 AM

Just hard-code the seed. There you go, deterministic!

api • last Thursday at 7:36 PM

Run them in a VM.

Probably good advice for lots of things these days given supply chain attacks targeting build scripts, git, etc.

ymyms • last Thursday at 7:05 PM

You are very on base. In fact, there is a deep conflict that needs to be solved: the non-determinism is the feature of an agent. Something that can "think" for itself and act. If you force agents to be deterministic, don't you just have a slow workflow at that point?

alt Hacker News

Replies