Really glad to see some academic research on this- it was quite obvious from interacting with LLMs that they form a world model and can, e.g. simulate simple physics experiments correctly that are not in the training set. I found it very frustrating to see people repeating the idea that “it can never do x” because it lacks a world model. Predicting text that represents events in the world requires modeling that world. Just because you can find examples where the predictions of a certain model are bad does not imply no model at all. At the limit of prediction becoming as good as theoretically possible given the input data and model size restrictions, the model also becomes as accurate and complete as possible. This process is formally described by the Solomonoff Induction theory.
I've seen some very impressive results just embedding a pre-trained KGE model into a transformer model, and letting it "learn" to query it (I've just used heterogenous loss functions during training with "classifier dimensions" that determine whether to greedily sample from the KGE sidecar, I'm sure there are much better ways of doing this.). This is just subjective viewpoint obviously, but I've played around quite a lot with this idea, and it's very easy to get a an "interactive" small LLM with stable results doing such a thing, the only problem I've found is _updating_ the knowledge cheaply without partially retraining the LLM itself. For small, domain-specific models this isn't really an issue though - for personal projects I just use a couple 3090s.
I think this stuff will become a lot more fascinating after transformers have bottomed out on their hype curve and become a tool when building specific types of models.
I’ve replicated the OthelloGPT results mentioned in this paper personally - and it def felt like the next-move-only accuracy metric was not everything. Indeed, the authors of the original paper knew this, and so further validated the world model by intervening in a model’s forward pass to directly manipulate the world model (and check the resulting change in valid move predictions).
I’d also recommend checking out Neel Nanda’s work on OthelloGPT, where he demonstrated the world model was actually linear: https://arxiv.org/abs/2309.00941
I think there is a philosophical angle to this. I mean, my world map was constructed by chance interactions with the real world. Does this mean that the my world map is a close to the real world map, as their NN's map is to Manhattan? Is my world map full of non-existent streets, exits that are at the wrong place, etc. The NN map of Manhattan works almost 100% correctly when used for normal navigation but breaks apart badly when it has to plan a detour. How brittle is my world map?
Most of you probably know someone with a poor sense of direction (or may be yourself). From my experience, such people navigate primarily (or solely) by landmarks. This makes me wonder if the damaged maps shown in the paper are similar to the "world model" belonging to a directionally challenged person.
Really cool results. I'd love to see some human baselines for, say, NYC cabbies or regular Manhattanites, though. I'm sure my world model is "incoherent" vis-a-vis these metrics as well, but I'm not sure what degree of coherence I should be excited about.
Wrong as it is, I'm impressed they were able to get any maps out of their LLM that look vaguely cohesive. The shortest path map has bits of streets downtown and around Central Park that aren't totally red, and Central Park itself is clear on all 3 maps.
They used eight A100s, but don't say how long it took to train their LLM. It would be interesting to know the wall clock time they spent. Their dataset is, relatively speaking, tiny which means it should take fewer resources to replicate from scratch.
What's interesting though is that the Smalley model performed better, though they don't speculate why that is.
Once your model and map get larger than the thing it is modeling/mapping, then what?
Let us hope the Pigeonhole principle isn't flawed, else we can find ourselves batteries in the Matrix.
An LLM necessarily has to create some sort of internal "model" / representations pursuant to its "predict next word" training goal, given the depth and sophistication of context recognition needed to to well. This isn't an N-gram model restricted to just looking at surface word sequences.
However, the question should be what sort of internal "model" has it built? It seems fashionable to refer to this as a "world model", but IMO this isn't really appropriate, and certainly it's going to be quite different to the predictive representations that any animal that interacts with the world, and learns from those interactions, will have built.
The thing is that an LLM is an auto-regressive model - it is trying to predict continuations of training set samples solely based on word sequences, and is not privy to the world that is actually being described by those word sequences. It can't model the generative process of the humans who created those training set samples because that generative process has different inputs - sensory ones (in addition to auto-regressive ones).
The "world model" of a human, or any other animal, is built pursuant to predicting the environment, but not in a purely passive way (such as a multi-modal LLM predicting next frame in a video). The animal is primarily concerned with predicting the outcomes of it's interactions with the environment, driven by the evolutionary pressure to learn to act in way that maximizes survival and proliferation of its DNA. This is the nature of a real "world model" - it's modelling the world (as perceived thru sensory inputs) as a dynamical process reacting to the actions of the animal. This is very different to the passive "context patterns" learnt by an LLM that are merely predicting auto-regressive continuations (whether just words, or multi-modal video frames/etc).