logoalt Hacker News

pamalast Saturday at 10:02 PM1 replyview on HN

Agreed on all points.

What should one make of the orthogonal risk that the pretraining data of the LLM could leak corporate secrets under some rare condition even without direct input from the outside world? I doubt we have rigorous ways to prove that training data are safe from such an attack vector even if we trained our own LLMs. Doesn't that mean that running in-house agents on sensitive data should be isolated from any interactions with the outside world?

So in the end we could have LLMs run in containers using shareable corporate data that address outside world queries/data, and LLMs run in complete isolation to handle sensitive corporate data. But do we need humans to connect/update the two types of environments or is there a mathematically safe way to bridge the two?


Replies

simonwlast Saturday at 10:09 PM

If you fine-tune a model on corporate data (and you can actually get that to work, I've seen very few success stories there) then yes, a prompt injection attack against that model could exfiltrate sensitive data too.

Something I've been thinking about recently is a sort of air-gapped mechanism: an end user gets to run an LLM system that has no access to the outside world at all (like how ChatGPT Code Interpreter works) but IS able to access the data they've provided to it, and they can grant it access to multiple GBs of data for use with its code execution tools.

That cuts off the exfiltration vector leg of the trifecta while allowing complex operations to be performed against sensitive data.

show 1 reply