Does anyone remember the early 2000s joke virus emails? The ones that are variations on "This is a <outgroup> computer virus. As we don't have software engineers to write the code to do this automatically, please kindly forward this email to everyone in your address book then format your hard drive."
This is exactly as much malware as those were.
Please, for the love of all that is good, can we just try not to build and defend a world where, on encountering text like that, /your computer immediately follows the instructions/? Can we just all agree that such a world would be bad for everyone involved and using an LLM that risks doing this, with no container or guardrails, is at least as problematic as running an unpatched open email relay was back then?
> This is exactly as much malware as those were.
A joke virus email is a sign saying "please throw yourself down the stairs."
An obfuscated prompt injection that tries to delete data is someone greasing the stairs and turning off the lights.
Both rely on the environment being unsafe, but only one is deliberately trying to make the failure happen.
It's just as bad as a CPU acting on malicious instructions. We need to create safeguards for llms too, it's just that this is not the way to do things.