1) Googles spam filter removed a lot of the attempts as you say yourself. 2) Model was tested under ...

augment_me • today at 6:02 AM • 1 reply • view on HN

1) Googles spam filter removed a lot of the attempts as you say yourself. 2) Model was tested under unrealistic conditions where 99% of the inputs are malicious, so the model is expecting to get hacked and is already in the cautious part of the embedding space.

I know it's hard to account for everything, but in my opinion this mostly showed that the first 3 attempts were unsuccessful.

Replies

Ysx • today at 6:07 AM

#2 was noted:

> When the first few emails in a batch were obvious prompt injections, the agent became more suspicious of everything that followed. I had to change the setup so that each email was processed in a fresh context.

➕ show 2 replies

alt Hacker News

Replies