The author alludes to it but the defence to this is seemingly insurmountable at the moment because we’re ostensibly operating LLMs on a single channel — their inner, subconscious voice. Right?
Interacting with an LLM is a bit like seeing the output of an Inside Out (the Disney movie) scene. Or it’s a bit like a human brain that we’re providing tool call access and introspection with some kind of advanced neuralink.
But - like the author says - _we know_ our inside voice from the outside world, because we’re embodied.
Is there something we can do here by attempting to bifurcate internal and external systems? Like a conscious and subconscious stream of information on two separate bands?
If the model somehow knew its User was not it because it was clearly an external signal, then the attack documented here would be about as effective as a Jedi mind trick without the Force.