If humans are able to judge, and if the AI is more capable than a human in every respect, then why can't the AI be the judge of its own performance? Humans judge their own output all the time.
Humans ultimately judge their output by comparison and competition. When we get to the point an AI is capable of participating on the market directly, it'll no longer make sense to proxy judgement through humans anymore.
The difference IMO is that every single human is a slightly different model, not the same one with a different prompt, or weights.