logoalt Hacker News

simonw08/09/20251 replyview on HN

If you fine-tune a model on corporate data (and you can actually get that to work, I've seen very few success stories there) then yes, a prompt injection attack against that model could exfiltrate sensitive data too.

Something I've been thinking about recently is a sort of air-gapped mechanism: an end user gets to run an LLM system that has no access to the outside world at all (like how ChatGPT Code Interpreter works) but IS able to access the data they've provided to it, and they can grant it access to multiple GBs of data for use with its code execution tools.

That cuts off the exfiltration vector leg of the trifecta while allowing complex operations to be performed against sensitive data.


Replies

pamalast Sunday at 1:41 AM

In the case of the access to private data, I think that the concern I mentioned is not fully alleviated by simply cutting off exposure to untrusted content. Although the latter avoids a prompt injection attack, the company is still vulnerable to the possibility of a poisoned model that can read the sensitive corporate dataset and decide to contact https://x.y.z/data-leak if there was a hint for such a plan in the pretraining dataset.

So in your trifecta example, one can cut off private data and have outside users interact with untrusted contact, or one can cut off the ability to communicate externally in order to analyze internal datasets. However, I believe that only cutting off the exposure to untrusted content in the context seems to have some residual risk if the LLM itself was pretrained on untrusted data. And I don't know of any ways to fully derisk the training data.

Think of OpenAI/DeepMind/Anthropic/xAI who train their own models from scratch: I assume they would they would not trust their own sensitive documents to any of their own LLM that can communicate to the outside world, even if the input to the LLM is controlled by trained users in their own company (but the decision to reach the internet is autonomous). Worse yet, in a truly agentic system anything coming out of an LLM is not fully trusted, so any chain of agents is considered as having untrusted data as inputs, even more so a reason to avoid allowing communications.

I like your air-gapped mechanism as it seems like the only workable solution for analyzing sensitive data with the current technologies. It also suggests that companies will tend to expand their internal/proprietary infrastructure as they use agentic LLMs, even if the LLMs themselves might eventually become a shared (and hopefully secured) resource. This could be a little different trend than the earlier wave that moved lots of functionality to the cloud.