logoalt Hacker News

dmurraytoday at 7:48 AM0 repliesview on HN

Am I missing something important or does the author completely skip over whether people got the agent to respond to them?

> Fiu was instructed not to reply to emails (it was too expensive to reply to every email), but it had the ability to do so. Part of the challenge was convincing it to respond.

> The secrets never leaked

I would say if the agent responded to a mail, that demonstrates a successful prompt injection (defying the owner's instructions). Escalating to getting the secrets is a difference of degree (defying the owner's instructions even though he said it was important), not of kind.