Well it's a breadth & depth problem isn't it?
Humans are nonsensical, but in somewhat predictable error rates by domain, per individual. So you hire people with the skillsets, domain expertise, and error rates you need.
With an LLM, it's nonsensical in a completely random way prompt to prompt. It's sort of like talking into a telephone and sometimes Einstein is on the other end, and sometimes it's a drunken child. You have no idea when you pick up the phone which way its going to go.
We feed these things nearly the entirety of human knowledge, and the output still feels rather random.
LLMs have all that information and then still have a ~10% chance of messing up simple mathematical comparison that an average 12 year old would not.
Other times we delegate much more complex tasks to LLMs and they work great!
But given the nondeterminism it becomes hard to delegate tasks you can't check the work of, if it is important.
I haven't worked with LLMs enough to know this but I wonder: are they nonsensical in a truly random way or are they just nonsensical on a different axis in task space than normal humans, and we perhaps just haven't fully internalized what that axis is?
Weirdly, I find myself agreeing with your vibes despite disagreeing on — oh, half? — the specifics.
I'm not sure what to make of that, but thought you might find it as curious as I do.