logoalt Hacker News

benlivengood08/09/20252 repliesview on HN

I believe you've covered some working solutions in your presentation. They limit LLMs to providing information/summaries and taking tightly curated actions.

There are currently no fully general solutions to data exfiltration, so things like local agents or computer use/interaction will require new solutions.

Others are also researching in this direction; https://security.googleblog.com/2025/06/mitigating-prompt-in... and https://arxiv.org/html/2506.08837v2 for example. CaMeL was a great paper, but complex.

My personal perspective is that the best we can do is build secure frameworks that LLMs can operate within, carefully controlling their inputs and interactions with untrusted third party components. There will not be inherent LLM safety precautions until we are well into superintelligence, and even those may not be applicable across agents with different levels of superintelligence. Deception/prompt injection as offense will always beat defense.


Replies

simonw08/09/2025

I loved that Design Patterns for Securing LLM Agents against Prompt Injections paper: https://simonwillison.net/2025/Jun/13/prompt-injection-desig...

I wrote notes on one of the Google papers that blog post references here: https://simonwillison.net/2025/Jun/15/ai-agent-security/

NitpickLawyerlast Sunday at 9:29 AM

> CaMeL was a great paper

I've read the CaMeL stuff and it's good, but keep in mind it's just "mitigation", never "prevention".