I think this might be an unsolved problem. When GPT-5 came out, they had a "router" (classifier?) decide whether to use the thinking model or not.
It was terrible. You could upload 30 pages of financial documents and it would decide "yeah this doesn't require reasoning." They improved it a lot but it still makes mistakes constantly.
I assume something similar is happening in this case.
Is knowing how hard a problem is, before doing it, solved in humans?
[dead]
I find that GPT 5.4 is okay at it. It does think harder for harder problems and still answers quickly for simpler ones, IME.