logoalt Hacker News

regularfrylast Saturday at 10:35 PM0 repliesview on HN

One idea I've had floating about in my head is to see if we can control-vector our way out of this. If we can identify an "instruction following" vector and specifically suppress it while we're feeding in untrusted data, then the LLM might be aware of the information but not act on it directly. Knowing when to switch the suppression on and off would be the job of a pre-processor which just parses out appropriate quote marks. Or, more robustly, you could use prepared statements, with placeholders to switch mode without relying on a parser. Big if: if that works, it undercuts a different leg of the trifecta, because while the AI is still exposed to untrusted data, it's no longer going to act on it in an untrustworthy way.