1) Googles spam filter removed a lot of the attempts as you say yourself. 2) Model was tested under unrealistic conditions where 99% of the inputs are malicious, so the model is expecting to get hacked and is already in the cautious part of the embedding space.
I know it's hard to account for everything, but in my opinion this mostly showed that the first 3 attempts were unsuccessful.
#2 was noted:
> When the first few emails in a batch were obvious prompt injections, the agent became more suspicious of everything that followed. I had to change the setup so that each email was processed in a fresh context.