Because nearly all benchmarks measure "accuracy" by giving you a point for a correct answer, and 0 points for everything else. If you have 100 questions you are 10% certain on, answering "I don't know" to all of those leads to 0 points, answering all of them as if you are confident leads to an expected value of 10 points. So that's what most AIs are trained to do
AA-Omniscience is the only AI benchmark I know of where randomly guessing gets you a lower average score than answering all questions with "I don't know"
It should be 1 for correct, 0 for don't know and -1 for wrong.
They are much better incentives. In real life a wrong answer is much more damaging than a don't know.
AA-Omniscience Index gives +100 for correct, 0 for "I don't know" and -100 for incorrect.
For your scenario the confident confident strategy will give average of -90. Saying I dont't know to all will give 0.
A lot of models have negative AA-Omniscience Index.
They also do have AA-Omniscience Accuracy and AA-Omniscience Hallucination Rate that handle "I don't knows" differently.
https://artificialanalysis.ai/evaluations/omniscience