logoalt Hacker News

naaskingyesterday at 2:05 PM1 replyview on HN

I'm skeptical: use two different AIs which don't share the same weaknesses + random sample of manual reviews + blacklisting users that submit adversarial inputs for X years as a deterrent.


Replies

wizzwizz4yesterday at 2:59 PM

But how do you know an input is adversarial? There are other issues: verdicts are arbitrary, the false positive rate means you'd need manual review of all the rejects (unless you wanted to reject something like 5% of genuine research), you need the appeals process to exist and you can't automate that, so bad actors can still flood your bureaucracy even if you do implement an automated review process…

show 1 reply