The irony here is that the detection method is literally prompt injection — the same technique that&...

mika-el • today at 11:00 AM • 1 reply • view on HN

The irony here is that the detection method is literally prompt injection — the same technique that's a security vulnerability everywhere else. ICML embedded hidden instructions in PDFs that manipulate LLM output. In a different context that's an attack, here it's enforcement.

From my perspective this says something important about where we are with LLMs. The fact that you can reliably manipulate model output by hiding instructions in the input means the model has no real separation between data and commands. That's the fundamental problem whether you're catching lazy reviewers or defending against actual attacks.

Replies

nulltrace • today at 11:45 AM

[dead]

alt Hacker News

Replies