LLMs aren't world models

270 points • by ingve • last Sunday at 11:40 AM • 148 comments • view on HN

Comments

ameliaquining • yesterday at 9:12 PM

One thing I appreciated about this post, unlike a lot of AI-skeptic posts, is that it actually makes a concrete falsifiable prediction; specifically, "LLMs will never manage to deal with large code bases 'autonomously'". So in the future we can look back and see whether it was right.

For my part, I'd give 80% confidence that LLMs will be able to do this within two years, without fundamental architectural changes.

➕ show 7 replies

libraryofbabel • last Sunday at 7:45 PM

This essay could probably benefit from some engagement with the literature on “interpretability” in LLMs, including the empirical results about how knowledge (like addition) is represented inside the neural network. To be blunt, I’m not sure being smart and reasoning from first principles after asking the LLM a lot of questions and cherry picking what it gets wrong gets to any novel insights at this point. And it already feels a little out date, with LLMs getting gold on the mathematical Olympiad they clearly have a pretty good world model of mathematics. I don’t think cherry-picking a failure to prove 2 + 2 = 4 in the particular specific way the writer wanted to see disproves that at all.

LLMs have imperfect world models, sure. (So do humans.) That’s because they are trained to be generalists and because their internal representations of things are massively compressed single they don’t have enough weights to encode everything. I don’t think this means there are some natural limits to what they can do.

➕ show 3 replies

keeda • last Sunday at 11:05 PM

That whole bit about color blending and transparency and LLMs "not knowing colors" is hard to believe. I am literally using LLMs every day to write image-processing and computer vision code using OpenCV. It seamlessly reasons across a range of concepts like color spaces, resolution, compression artifacts, filtering, segmentation and human perception. I mean, removing the alpha from a PNG image was a preprocessing step it wrote by itself as part of a larger task I had given it, so it certainly understands transparency.

I even often describe the results e.g. "this fails when in X manner when the image has grainy regions" and it figures out what is going on, and adapts the code accordingly. (It works with uploading actual images too, but those consume a lot of tokens!)

And all this in a rather niche domain that seems relatively less explored. The images I'm working with are rather small and low-resolution, which most literature does not seem to contemplate much. It uses standard techniques well known in the art, but it adapts and combines them well to suit my particular requirements. So they seem to handle "novel" pretty well too.

If it can reason about images and vision and write working code for niche problems I throw at it, whether it "knows" colors in the human sense is a purely philosophical question.

➕ show 1 reply

frankfrank13 • yesterday at 9:22 PM

Great quote at the end that I think I resonate a lot with:

> Feeding these algorithms gobs of data is another example of how an approach that must be fundamentally incorrect at least in some sense, as evidenced by how data-hungry it is, can be taken very far by engineering efforts — as long as something is useful enough to fund such efforts and isn’t outcompeted by a new idea, it can persist.

ej88 • last Sunday at 8:40 PM

This article is interesting but pretty shallow.

0(?): there’s no provided definition of what a ‘world model’ is. Is it playing chess? Is it remembering facts like how computers use math to blend Colors? If so, then ChatGPT: https://chatgpt.com/s/t_6898fe6178b88191a138fba8824c1a2c has a world model right?

1. The author seems to conflate context windows with failing to model the world in the chess example. I challenge them to ask a SOTA model with an image of a chess board or notation and ask it about the position. It might not give you GM level analysis but it definitely has a model of what’s going on.

2. Without explaining which LLM they used or sharing the chats these examples are just not valuable. The larger and better the model, the better its internal representation of the world.

You can try it yourself. Come up with some question involving interacting with the world and / or physics and ask GPT-5 Thinking. It’s got a pretty good understanding of how things work!

https://chatgpt.com/s/t_689903b03e6c8191b7ce1b85b1698358

➕ show 1 reply

DennisP • today at 12:38 AM

Maybe pure language models aren't world models, but Genie 3 for example seems to be a pretty good world model:

https://deepmind.google/discover/blog/genie-3-a-new-frontier...

We also have multimodal AIs that can do both language and video. Genie 3 made multimodal with language might be pretty impressive.

Focusing only on what pure language models can do is a bit of a straw man at this point.

skeledrew • last Monday at 12:49 AM

Agree in general with most of the points, except

> but because I know you and I get by with less.

Actually we got far more data and training than any LLM. We've been gathering and processing sensory data every second at least since birth (more processing than gathering when asleep), and are only really considered fully intelligent in our late teens to mid-20s.

➕ show 1 reply

lordnacho • last Sunday at 8:38 PM

Here's what LLMs remind me of.

When I went to uni, we had tutorials several times a week. Two students, one professor, going over whatever was being studied that week. The professor would ask insightful questions, and the students would try to answer.

Sometimes, I would answer a question correctly without actually understanding what I was saying. I would be spewing out something that I had read somewhere in the huge pile of books, and it would be a sentence, with certain special words in it, that the professor would accept as an answer.

But I would sometimes have this weird feeling of "hmm I actually don't get it" regardless. This is kinda what the tutorial is for, though. With a bit more prodding, the prof will ask something that you genuinely cannot produce a suitable word salad for, and you would be found out.

In math-type tutorials it would be things like realizing some equation was useful for finding an answer without having a clue about what the equation actually represented.

In economics tutorials it would be spewing out words about inflation or growth or some particular author but then having nothing to back up the intuition.

This is what I suspect LLMs do. They can often be very useful to someone who actually has the models in their minds, but not the data to hand. You may have forgotten the supporting evidence for some position, or you might have missed some piece of the argument due to imperfect memory. In these cases, LLM is fantastic as it just glues together plausible related words for you to examine.

The wheels come off when you're not an expert. Everything it says will sound plausible. When you challenge it, it just apologizes and pretends to correct itself.

➕ show 2 replies

deadbabe • last Sunday at 8:11 PM

Don’t: use LLMs to play chess against you

Do: use LLMs to talk shit to you while a real chess AI plays chess against you.

The above applies to a lot of things besides chess, and illustrates a proper application of LLMs.

o_nate • last Monday at 7:20 PM

What with this and your previous post about why sometimes incompetent management leads to better outcomes, you are quickly becoming one of my favorite tech bloggers. Perhaps I enjoyed the piece so much because your conclusions basically track mine. (I'm a software developer who has dabbled with LLMs, and has some hand-wavey background on how they work, but otherwise can claim no special knowledge.) Also your writing style really pops. No one would accuse your post of having been generated by an LLM.

➕ show 1 reply

bithive123 • yesterday at 9:21 PM

Language models aren't world models for the same reason languages aren't world models.

Symbols, by definition, only represent a thing. They are not the same as the thing. The map is not the territory, the description is not the described, you can't get wet in the word "water".

They only have meaning to sentient beings, and that meaning is heavily subjective and contextual.

But there appear to be some who think that we can grasp truth through mechanical symbol manipulation. Perhaps we just need to add a few million more symbols, they think.

If we accept the incompleteness theorem, then there are true propositions that even a super-intelligent AGI would not be able to express, because all it can do is output a series of placeholders. Not to mention the obvious fallacy of knowing super-intelligence when we see it. Can you write a test suite for it?

➕ show 12 replies

imenani • last Sunday at 8:17 PM

As far as I can tell they don’t say which LLM they used which is kind of a shame as there is a huge range of capabilities even in newly released LLMs (e.g. reasoning vs not).

➕ show 2 replies

t0md4n • last Sunday at 6:58 PM

https://arxiv.org/abs/2501.17186

➕ show 1 reply

jonplackett • last Sunday at 9:13 PM

I just tried a few things that are simple and a world model would probably get right. Eg

Question to GPT5: I am looking straight on to some objects. Looking parallel to the ground.

In front of me I have a milk bottle, to the right of that is a Coca-Cola bottle. To the right of that is a glass of water. And to the right of that there’s a cherry. Behind the cherry there’s a cactus and to the left of that there’s a peanut. Everything is spaced evenly. Can I see the peanut?

Answer (after choosing thinking mode)

No. The cactus is directly behind the cherry (front row order: milk, Coke, water, cherry). “To the left of that” puts the peanut behind the glass of water. Since you’re looking straight on, the glass sits in front and occludes the peanut.

It doesn’t consider transparency until you mention it, then apologises and says it didn’t think of transparency

➕ show 3 replies

GaggiX • last Sunday at 8:28 PM

https://www.youtube.com/watch?v=LtG0ACIbmHw

Sota LLMs do play legal moves in chess, I don't why the article seem to say otherwise.

➕ show 1 reply

og_kalu • last Sunday at 8:21 PM

Yes LLMs can play chess and yes they can model it fine

https://arxiv.org/pdf/2403.15498v2

UltraSane • yesterday at 11:57 PM

I wonder how the nature of the language used to train an LLM affects its model of the world. Would a language designed for the maximum possible information content and clarity like Ithkuil make an LLMs world model more accurate?

rishi_devan • last Sunday at 8:04 PM

Haha. I enjoyed that Soviet-era joke at the end.

➕ show 1 reply

Razengan • last Sunday at 10:01 PM

A slight tangent: I think/wonder if the one place where AIs could be really useful, might be in translating alien languages :)

As in, an alien could teach one of our AIs their language faster than an alien could teach an human, and vice versa..

..though the potential for catastrophic disasters is also great there lol

1970-01-01 • yesterday at 9:28 PM

I'm surprised the models haven't been enshittified by capitalism. I think in a few years we're going to see lightning-fast LLMs generating better output compared to what we're seeing today. But it won't be 1000x better, it will be 10x better, 10x faster, and completely enshittified with ads and clickbait links. Enjoy ChatGPT while it lasts.

neuroelectron • yesterday at 8:55 PM

Not yet

alt Hacker News

LLMs aren't world models

Comments