That's precisely why I am using a different analogy when talking about this. The SQL injection analogy only matches the injection part, not the rest. There is nothing to secure, because there is no SQL query. You want the agent to work on data, in a "general" way, otherwise you'd just use a script.
The better analogy is phishing. Because that's what's happening here. The "prompt injection" attack is trying to "phish" the LLM into doing something unintended. That's how we should all comunicate it, as it matches better with what's happening. Unfortunately there aren't really good defences for it, as we all know from phishing "education" / "campaigns". Your best bet is to secure it in layers, try to have warnings (i.e. classification models) you try to secure the next step (i.e. capabilities based tool execution) and so on. But it's not foolproof and it should be communicated clearly.
That's precisely why I am using a different analogy when talking about this. The SQL injection analogy only matches the injection part, not the rest. There is nothing to secure, because there is no SQL query. You want the agent to work on data, in a "general" way, otherwise you'd just use a script.
The better analogy is phishing. Because that's what's happening here. The "prompt injection" attack is trying to "phish" the LLM into doing something unintended. That's how we should all comunicate it, as it matches better with what's happening. Unfortunately there aren't really good defences for it, as we all know from phishing "education" / "campaigns". Your best bet is to secure it in layers, try to have warnings (i.e. classification models) you try to secure the next step (i.e. capabilities based tool execution) and so on. But it's not foolproof and it should be communicated clearly.