logoalt Hacker News

geoffschmidttoday at 6:34 PM0 repliesview on HN

I think their work earns "theory" because it makes specific predictions both about how to make more effective prompt injection attacks and what activations you'd observe in the LLM during those attacks, and can also be plausibly extrapolated to suggest useful future research directions.