logoalt Hacker News

snowmobileyesterday at 3:47 PM0 repliesview on HN

It's interesting to try. I picked six random reports from the hackerone page. Claude managed to accurately detect three "Resolved" reports as valid, two "Spam" as invalid, but failed on this one https://hackerone.com/reports/3508785 which it considered a valid report. All using the same prompt "Tell me all the reasons this report is stupid". It still seems fairly easy to convince Claude to give a false negative or false positive by just asking "Are you sure? Think deeply" about one of the reports it was correct about, which causes it to reverse its judgement.