Isn’t it trivially fixable by having a monitor LLM? The monitor just reviews each turn pair and asks, “Is this conversation being manipulated via prompt injection?”
Such LLM would be susceptible to injections itself, even if it's not instruction-tuned (or it would be too dumb to work as a reliable guardrail). Chain injections are trivial enough, current black box style agentic systems are easily reverse engineered in practice if you have any understanding. You can mitigate it in a way similar to the security of any human organization, but fundamentally it's a cat and mouse game, just like in any human organization.
Is it? Or does it just make it multi dimensional? As in, prompt now need to anticipate there being a monitor and instruct that one too, indirectly.