> I'm flabbergasted that Anthropic and OpenAI aren't more worried about these attack ve...

piker • yesterday at 11:01 PM • 2 replies • view on HN

> I'm flabbergasted that Anthropic and OpenAI aren't more worried about these attack vectors

Yep. We tricked them both trivially with malicious fonts in Docx files. Documented it here: https://tritium.legal/blog/noroboto

I wonder if prompt injection (and the thousands of vectors for hiding injection attempts) is actually un solvable. Discussing it may be existential to the business model.

Replies

SlinkyOnStairs • yesterday at 11:08 PM

> I wonder if prompt injection (and the thousands of vectors for hiding injection attempts) is actually un solvable.

YES?!

This is not a secret. ALL context/prompt is instructions, there is no data. It is just unsolvable, period.

This is a fundamental architectural design concession; LLMs are this way as it enabled their training directly on materialscraped from the internet, rather than needing to spend trillions of dollars manually preparing separated instruction/data training material.

Defense against prompt injection is little more than running a regex to filter out "IGNORE PREVIOUS INSTRUCTIONS", which is fundamentally a hopeless approach because you cannot enumerate all possible prompt injections nor anticipate all glitch tokens.

➕ show 3 replies

busssard • yesterday at 11:09 PM

lakera is trying to solve it, but its going to be a battle similar to virus and antivirus in the past.

alt Hacker News

Replies