logoalt Hacker News

yen223today at 7:36 AM2 repliesview on HN

There's a lot of overlap between the "disregard this" vulnerability among LLMs and social engineering vulnerabilities among humans.

The mitigations are also largely the same, i.e. limit the blast radius of what a single compromised agent (LLM or human) can do


Replies

calpatersontoday at 7:40 AM

I agree and one of the things that makes it harder to handle "disregard that!" is that many models for LLM deployment involve positioning the agent centrally and giving it admin superpowers.

I mention in the footnotes that I think that it makes more sense for the end-user of the LLM to be the one running it. That meshes with RBAC better (the user's LLM session only has the perms the user is actually entitled to) and doesn't devolve into praying the LLM says on-task.

zahlmantoday at 7:56 AM

It also seems to have a fair bit in common with SQL injection.