> It can't model the generative process of the humans who created those training set samples because that generative process has different inputs - sensory ones (in addition to auto-regressive ones).
I think that’s too strong a statement. I would say that it’s very constrained in its ability to model that, but not having access to the same inputs doesn’t mean you can’t model a process.
For example, we model hurricanes based on measurements taken from satellites. Those aren’t the actual inputs to the hurricane itself, but abstracted correlates of those inputs. An LLM does have access to correlates of the inputs to human writing, i.e. textual descriptions of sensory inputs.
Brilliant analogy.
And we can imagine that, in a sci-fi world where some super-being could act on a scale that would allow it to perturb the world in a fashion amenable to causing hurricanes, the hurricane model could be substantially augmented, for the same reason motor babbling in an infant leads to fluid motion as a child.
What has been a revelation to me is how, even peering through this dark glass, titanic amounts of data allow quite useful world models to emerge, even if they're super limited -- a type of "bitter lesson" that suggests we're only at the beginning of what's possible.
I expect robotics + LLM to drive the next big breakthroughs, perhaps w/ virtual worlds [1] as an intermediate step.
Indeed. If you provided a talented individual with a sufficient quantity and variety of video streams of travels in a city (like New York), that person would be able to draw you a map.
You can model a generative process, but it's necessarily an auto-regressive generative process, not the same as the originating generative process which is based on the external world.
Human language, and other actions, exist on a range from almost auto-regressive (generating a stock/practiced phrase such as "have a nice day") to highly interactive ones. An auto-regressive model is obviously going to have more success modelling an auto-regressive generative process.
Weather prediction is really a good case of the limitation of auto-regressive models, as well as models that don't accurately reflect the inputs to the process you are attempting to predict. "There's a low pressure front coming in, so the weather will be X, same as last time", works some of the time. A crude physical weather model based on limited data points, such as weather balloon inputs, or satellite observation of hurricanes, also works some of the time. But of course these models are sometimes hopelessly wrong too.
My real point wasn't about the lack of sensory data, even though this does force a purely auto-regressive (i.e. wrong) model, but rather about the difference between a passive model (such as weather prediction), and an interactive one.