I haven't worked with LLMs enough to know this but I wonder: are they nonsensical in a truly random way or are they just nonsensical on a different axis in task space than normal humans, and we perhaps just haven't fully internalized what that axis is?
I'm not really sure, and you can pull lots of funny examples where various models have progress & regressions dealing with such mundane simple math.
As recently as August "11.10 or 11.9 which is bigger" came up with the wrong answer on ChatGPT and was followed with lots of wrong justification for the wrong answer. Even follow up math question "what is 11.10 - 11.9" gave me the answer "11.10 - 11.9 equals 0.2"
We can quibble about what model I was using, or what edge case I hit, or how quick they fixed it.. but this is 2 years into the very public LLM hype wave so at some point I expect better.
It gives me pause in asking more complex math questions I cannot immediately verify results, in which case, again why would I pay for a tool to ask questions I already know the answer to?