This is a tired semantic argument that does not bring any insight into the discussion. A token-predictor could still be trained to predict the tokens “I’m not sure what you mean because of points x, y, and z; could you elaborate?”
It means if you want something resembling a self-introspective theory of mind, you need to arrange the overall document to cohere to documents where such things are/appear-to-be happening.
This leads us to new questions: How can we characterize and identify real-world documents which fit? How can we determine what features may be significant, and which of those can be easily transplanted to our use-case?
It could be trained to say that, but it's not exactly clear how you would reinforce the absence of certain training data in order to emit that response accurately, rather than just based on embedding proximity.
How would an LLM “know” when it isn’t sure? Their baseline for truth is competent text, they don’t have a baseline for truth based on observed reality. That’s why they can be “tricked” into things like “Mr Bean is the president of the USA”
I agree that it's a tired argument, but there appears to be two separate things being discussed in this little corner of HN. Clarity in the problem it's being asked to solve, and confidence that the answer it has is correct.
I can trivially get any of the foundational models to ask me clarifying questions. I've never had one respond with 'I don't know'.
Anthropic found that it Claude will pretend that it used the "standard" way to do addition- add the digits, carry the 1, etc- but the pattern of activations showed it using a completely different algorithm. So these things can role play as introspecting- they come up with plausible post-hoc explanations for their output- but they are still just pretending, so they will get it wrong.
So you can teach a model to sometimes ask for clarification, but will it actually have insight into when it really needs it, or will it just interject for clarification more or less at random? These models have really awful insight into their own capabilities, ChatGPT eg insists to me that it can read braille, and then cheerfully generates a pure hallucination.
I disagree, it's a very insightful comment.
The problem is that any information about any internal processes used to generate a particular token is lost; the LLM is stateless, apart from the generated text. If you ask an LLM-character (which I agree should be held distinct from the LLM itself and exists at a different layer of abstraction) why it said something, the best it can do is a post-hoc guess. The "character", and any internal state we might wish it to have, only exists insofar as it can be derived anew from the text.
It's not a tired argument, and not just a semantic one it's a foundational characteristic of LLM.
> A token-predictor could still be trained to predict the tokens “I’m not sure what you mean because of points x, y, and z; could you elaborate?”
This is entirely true, and the key insight is even right in your sentence but you don't seem to grasp it. “could still be trained”: you can train an LLM into doing whatever you want it to, but you have to train it specifically for that!
In the beginning of LLM we witnessed this impressive phenomenon where the LLM exhibited emergent capabilities (I'm particularly thinking about LLMs being few shots learners about stuff that wasn't in their training corpus). And these emergent capabilities legitimately raised the question about “how intelligent these things are, really”.
But for the past three years, the key lesson is that this kind of emergent effect is too small to be useful, and the focus has been put towards creating purposely built datasets (with tons of “artificial data”) to train the model to explicitly do things we want it to do. And it works pretty well, as models' capabilities kept improving at a fast pace (and in particular, I don't see would we couldn't overcome the problem highlighted by this paper, with more synthetic data specifically designed for multi-turn conversation). But their progress is now strictly limited by their makers' own intelligence. You cannot just scrap the web throw compute at the problem and expect emergent intelligence to occur anymore. It's more “simulated intelligence” than “artificial intelligence”, really.