How does it know the difference?
I'm still on the AI-skeptic side of the spectrum (though shifting more towards "it has some useful applications"), but, I think the easy answer is - if different models/prompts are used in generation than in quality-/correctness-checking.
I think Claude, given enough time to mull it over, could probably come up with some sort of bug severity score.
This might not always work, but whenever possible, a working exploit could be demanded, working in a form that can be automatically verified to work.