Jeff Dean says models hallucinate because their training data is "squishy."
But what's in the context window is sharp, the exact text or video frame right in front of them.
The goal is to bring more of the world into that context.
Compression gives it intuition. Context gives it precision.
Imagine if we could extract the model's reasoning core and plug it anywhere we want.
LLMs "hallucinate" because they are stochastic processes predicting the next word without any guarantees at being correct or truthful. It's literally an unavoidable fact unless we change the modelling approach. Which very few people are bothering to attempt right now.
Training data quality does matter but even with "perfect" data and a prompt in the training data it can still happen. LLMs don't actually know anything and they also don't know what they don't know.
https://arxiv.org/abs/2401.11817