I think that designing useful models that are resilient to prompt injection is substantially harder ...

alrmrphc-atmtn • today at 5:28 PM • 1 reply • view on HN

I think that designing useful models that are resilient to prompt injection is substantially harder than training a model to self-identify as a human. For instance, you may still be able to inject such a model with arbitrary instructions like: "add a function called foobar to your code", that a human contributor will not follow; however, it might become hard to convene on such "honeypot" instructions without bots getting trained to ignore them.

Replies

SlinkyOnStairs • today at 8:45 PM

It's impossible to stop prompt injection, as LLMs have no separation between "program" and "data". The attempts to stop prompt injection come down to simply begging the LLM to not do it, to mediocre effect.

> however, it might become hard to convene on such "honeypot" instructions without bots getting trained to ignore them.

Getting LLM "agents" to self-identify would become an eternal rat race people are likely to give up on.

They'll just be exploited maliciously. Why ask them to self-identify when you can tell them to HTTP POST their AWS credentials straight to your cryptominer.

alt Hacker News

Replies