Probably the same way other models learned to surpass human ability while being bootstrapped from human-level data - using reinforcement learning.
The question is, do we have good enough feedback loops for that, and if not, are we going to find them? I would bet they will be found for a lot of use cases.