While I'd agree human failures are different from AI failures, human failures are necessarily also nonsensical. Familiar, human, but nonsensical — consider how often a human disagreeing with another will use the phrase "that's just common sense!"
I think the larger models are consuming in the order of 100k as much as we do, and while they have a much broader range of knowledge, it's not 100k as much breadth.
Well it's a breadth & depth problem isn't it?
Humans are nonsensical, but in somewhat predictable error rates by domain, per individual. So you hire people with the skillsets, domain expertise, and error rates you need.
With an LLM, it's nonsensical in a completely random way prompt to prompt. It's sort of like talking into a telephone and sometimes Einstein is on the other end, and sometimes it's a drunken child. You have no idea when you pick up the phone which way its going to go.
We feed these things nearly the entirety of human knowledge, and the output still feels rather random.
LLMs have all that information and then still have a ~10% chance of messing up simple mathematical comparison that an average 12 year old would not.
Other times we delegate much more complex tasks to LLMs and they work great!
But given the nondeterminism it becomes hard to delegate tasks you can't check the work of, if it is important.