It's worth asking why we haven't had the AlphaZero moment for general learning yet, where no human data is needed.
I think the issue is that for games and other closed-ended systems the criteria are very easy, so self-referential training is effective.
That's easy, AlphaZero had a perfect simulator of the world it existed in (chess, super easy), so it was insanely easy to run simulations of that world ad infinitum, and learn from it.
It's simply not the case for the real world, you can't simulate the world perfectly and see what happens when you do things.