logoalt Hacker News

simonw08/09/20252 repliesview on HN

Given how important this problem is to solve I would advise anyone with a credible solution to shout it from the rooftops and then make a ton of money out of the resulting customers.


Replies

benlivengood08/09/2025

I believe you've covered some working solutions in your presentation. They limit LLMs to providing information/summaries and taking tightly curated actions.

There are currently no fully general solutions to data exfiltration, so things like local agents or computer use/interaction will require new solutions.

Others are also researching in this direction; https://security.googleblog.com/2025/06/mitigating-prompt-in... and https://arxiv.org/html/2506.08837v2 for example. CaMeL was a great paper, but complex.

My personal perspective is that the best we can do is build secure frameworks that LLMs can operate within, carefully controlling their inputs and interactions with untrusted third party components. There will not be inherent LLM safety precautions until we are well into superintelligence, and even those may not be applicable across agents with different levels of superintelligence. Deception/prompt injection as offense will always beat defense.

show 2 replies
Terr_last Sunday at 8:35 AM

Find the smallest secret you can't have stolen, calculate the minimum number of bits to represent it, and block any LLM output that has enough entropy to hold it. :P