Thinking aloud, but couldn't someone create a website with some malicious text that, when quoted in a prompt, convinces the LLM to expose certain private data to the web page, and couldn't the webpage send that data to a third party, without the need for the LLM to do so?
This is probably possible to mitigate, but I fear what people more creative, motivated and technically adept could come up with.
Why does the LLM get to send data to the website?? That’s my whole point, if you don’t expose a way for it to send data anywhere, it can’t.
At least with finetuning, yes: https://arxiv.org/abs/2512.09742
It's unclear if this technique could also work with in-prompt data.