The AI doesn't know what good or bad code is. It doesn't know what surpassing someone means. It's been trained to generate text similar to its training data, and that's what it does.
If you feed it only good code, we'd expect a better result, but currently we're feeding it average code. The cost to evaluate code quality for the huge data set is too high.
The training data includes plenty of examples of labelled good and bad code. And comparisons between two implementations plus trade-offs and costs and benefits. I think it absolutely does "know" good code, in the sense that it can know anything at all.