logoalt Hacker News

clarity_hackeryesterday at 5:28 PM0 repliesview on HN

This is the confused deputy problem at the application layer. Sandboxing secures the environment, but if the agent has legitimate access to sensitive operations (email, database writes, API calls), prompt injection attacks work through approved channels. The only hard defense is explicit user confirmation for each action, which defeats the point of autonomy.