You can get rid of 99.9% of those attacks by simply dispatching the data consumption to a different instance of the LLM, see, for instance, some of the later patterns in https://arxiv.org/abs/2506.08837
Thanks for the article link! Do you happen to know where to follow/read more articles like this for someone interested in getting more into AI security? Ty
How would they apply to this case?
They require being able to transorm the output to something symbolic, but this YouTube feature necessarily has to output free-form text, derived directly from the comments..!
What would actually prevent the "attack" is for YouTube to not turn markdown from random LLM outputs into actual links.
In general, those patterns seem applicable only to a limited amount of cases, I think that they prevent much less than 99.9% of the attacks.