With the current scale and speed, it is not yet viable to make N+1 calls to other models with specific prompts. (Or even calling multiple fine-tuned models)
However, even Google (and others) admit(s) that some sort of prompt-injection is always possible, hence out-of-scope for bug-bounty programs.
There are only 2 ways to fix this;
1. Either we ask multiple models with multiple system prompts to validate both inputs, processing, and outputs, then showing results to the user. Possibly making these kind of indirect attacks 2x-3x or Nx more difficult. (ie. Specialized checks and post-processing of the output of original model)
Note that this is linearly-scalable, looking like a *nix shell (bash) pipeline as such: `input-sanitizer-llm | translation-llm | output-sanitizer-llm | security-guard-llm`
2. I do not want to say "tiny LLMs" as the term itself is silly, but essentially finding a similar but different architecture to utilize transformers & language-relationship parts to create one-to-one models that are specialized for certain jobs. Currently we use "General knowledge" LLMs and trying to "specialize" their output, this is inefficient overall as you have bunch of unnecessary things encoded in it, which are causing either hallucinations or these kinds of attacks. Meanwhile, an LLM with no information about some other unnecessary things besides the task it was trained for would be much better and safer. (WIthout requiring linear scaling of point#1)
I also believe that the tokenizer will require the most work to make point #2 possible. If point #2 becomes even a slight reality, capacity constraints will drop significantly yielding much higher efficiency for those agentic tasks.