That's just not how they work, really. They don't know what they don't know and their process requires an output.
I think they're getting better at it, but it's likely just the number of parameters getting bigger and bigger in the SOTA models more than anything.
They do know what they don't know. There's a probability distribution for outputs that they are sampling from. That just isn't being used for that purpose.