Because nearly all benchmarks measure "accuracy" by giving you a point for a correct answe...

wongarsu • yesterday at 11:48 AM • 2 replies • view on HN

Because nearly all benchmarks measure "accuracy" by giving you a point for a correct answer, and 0 points for everything else. If you have 100 questions you are 10% certain on, answering "I don't know" to all of those leads to 0 points, answering all of them as if you are confident leads to an expected value of 10 points. So that's what most AIs are trained to do

AA-Omniscience is the only AI benchmark I know of where randomly guessing gets you a lower average score than answering all questions with "I don't know"

Replies

jampekka • yesterday at 12:53 PM

AA-Omniscience Index gives +100 for correct, 0 for "I don't know" and -100 for incorrect.

For your scenario the confident confident strategy will give average of -90. Saying I dont't know to all will give 0.

A lot of models have negative AA-Omniscience Index.

They also do have AA-Omniscience Accuracy and AA-Omniscience Hallucination Rate that handle "I don't knows" differently.

https://artificialanalysis.ai/evaluations/omniscience

nutjob2 • yesterday at 11:59 AM

It should be 1 for correct, 0 for don't know and -1 for wrong.

They are much better incentives. In real life a wrong answer is much more damaging than a don't know.

➕ show 6 replies

alt Hacker News

Replies