logoalt Hacker News

uHugetoday at 4:34 AM2 repliesview on HN

Is there a way to replay the sequence of mails that came so that you can check out if cheaper models handle them just as well/safely?


Replies

schobitoday at 5:28 AM

I'm surprised there are no security researchers that would pick up on this.

Take the same prompt and all incoming mails and run again through various existing models, even the simpler local ones. He now has a serious cross section of prompt injection ideas. This is a publication I would like to read!

For privacy reasons I understand the corpus might not get published. But for a research collaboration and safeguards (don't send automatic answers from each model you try)... why not?

croestoday at 4:52 AM

Or check if the results are the same even with the same model