logoalt Hacker News

0cf8612b2e1etoday at 2:48 PM1 replyview on HN

You mean like the section which goes into the threat model?

  The Security Model: Design for Distrust

  I wrote about this in Don’t Trust AI Agents: when you’re building with AI agents, they should be treated as untrusted and potentially malicious. Prompt injection, model misbehavior, things nobody’s thought of yet. The right approach is architecture that assumes agents will misbehave and contains the damage when they do…

Replies

croestoday at 3:49 PM

Don‘t you see the contradiction?

I don’t trust the agent so I sandbox it before I gave it the access data to my mail and bank accounts

show 1 reply