You are asking why someone don't want to ship a tool that obviously doesn't work? Surely it's always better/more profitable to ship a tool that at least seems to work
No? I’m interested in why LLMs are bad at knowing when they don’t know the answer, and why that’s a particularly difficult problem to solve.
GP means they aren't good at knowing when they are wrong and should spend more compute on the problem.
I would say the current generation of LLMs that "think harder" when you tell them their first response is wrong is a training grounds for knowing to think harder without being told, but I don't know the obstacles.