Seems like this is an aspect of their well-known overconfidence and the inability to self-reflect and recognize they have to ask for more details because their priors are too low. If you look at the output of reasoning models, it’s clear that the idea of asking for clarification very rarely occurs to them – when they’re confused, it’s just endless speculation of what the user might have meant.
This, of course, has certain implications as to the wisdom of the idea of “replacing human programmers”, given that one of the hard parts of the trade is trying to turn vague and often confused ideas into precise specifications by interacting with the shareholders.
The inability of LLMs of ask for clarification was exactly the flaw we encountered when testing them on open-ended problems, stated somewhat ambiguously. This was in the context of paradoxical situations, tested on DeepSeek-R1 and Claude-3.7-Sonnet. Blog post about our experiments: https://pankajpansari.github.io/posts/paradoxes/
> inability to self-reflect and recognize they have to ask for more details because their priors are too low.
Gemini 2.5 Pro and ChatGPT-o3 have often asked me to provide additional details before doing a requested task. Gemini sometimes comes up with multiple options and requests my input before doing the task.
Isn’t this relatively trivial to correct? Just like chain of thought reasoning replaces end tokens with “hmm” to continue the thought can’t users just replace the llm tokens whenever it starts saying “maybe they are referring to” with something like. “Let me ask a clarifying question before I proceed.”
Real programmers spend a ton of time just figuring out what people actually want. LLMs still treat guessing as a feature
> and the inability to self-reflect and recognize they have to ask for more details
They're great at both tasks, you just have to ask them to do it.
> inability to self-reflect
IMO the One Weird Trick for LLMs is recognizing that there's no real entity, and that users are being tricked into a suspended-disbelief story.
In most cases cases you're contributing text-lines for a User-character in a movie-script document, and the LLM algorithm is periodically triggered to autocomplete incomplete lines for a Chatbot character.
You can have an interview with a vampire DraculaBot, but that character can only "self-reflect" in the same shallow/fictional way that it can "thirst for blood" or "turn into a cloud of bats."