With LLMs being unable to count how many Bs are in blueberry, they clearly don't have any world...

AyyEye • last Sunday at 8:05 PM • 6 replies • view on HN

With LLMs being unable to count how many Bs are in blueberry, they clearly don't have any world model whatsoever. That addition (something which only takes a few gates in digital logic) happens to be overfit into a few nodes on multi-billion node networks is hardly a surprise to anyone except the most religious of AI believers.

Replies

BobbyJo • last Sunday at 10:35 PM

The core issue there isn't that the LLM isn't building internal models to represent its world, it's that its world is limited to tokens. Anything not represented in tokens, or token relationships, can't be modeled by the LLM, by definition.

It's like asking a blind person to count the number of colors on a car. They can give it a go and assume glass, tires, and metal are different colors as there is likely a correlation they can draw from feeling them or discussing them. That's the best they can do though as they can't actually perceive color.

In this case, the LLM can't see letters, so asking it to count them causes it to try and draw from some proxy of that information. If it doesn't have an accurate one, then bam, strawberry has two r's.

I think a good example of LLMs building models internally is this: https://rohinmanvi.github.io/GeoLLM/

LLMs are able to encode geospatial relationships because they can be represented by token relationships well. Teo countries that are close together will be talked about together much more often than two countries far from each other.

➕ show 2 replies

eru • today at 2:23 AM

> With LLMs being unable to count how many Bs are in blueberry, they clearly don't have any world model whatsoever.

Train your model on characters instead of on tokens, and this problem goes away. But I don't think this teaches us anything about world models more generally.

yosefk • last Sunday at 8:10 PM

Actually I forgive them those issues that stem from tokenization. I used to make fun at them for listing datum as a noun whose plural form ends with an i, but once I learned about how tokenization works, I no longer do it - it feels like mocking a person's intelligence because of a speech impediment or something... I am very kind to these things, I think

➕ show 1 reply

williamcotton • yesterday at 8:47 PM

I don’t solve math problems with my poetry writing skills:

https://chatgpt.com/share/689ba837-8ae0-8013-96d2-7484088f27...

andyjohnson0 • last Sunday at 8:29 PM

> With LLMs being unable to count how many Bs are in blueberry, they clearly don't have any world model whatsoever.

Is this a real defect, or some historical thing?

I just asked GPT-5:

    How many "B"s in "blueberry"?

and it replied:

    There are 2 — the letter b appears twice in "blueberry".

I also asked it how many Rs in Carrot, and how many Ps in Pineapple, amd it answered both questions correctly too.

➕ show 4 replies

libraryofbabel • last Sunday at 10:39 PM

> they clearly don't have any world model whatsoever

Then how did an LLM get gold on the mathematical Olympiad, where it certainly hadn’t seen the questions before? How on earth is that possible without a decent working model of mathematics? Sure, LLMs might make weird errors sometimes (nobody is denying that), but clearly the story is rather more complicated than you suggest.

➕ show 1 reply

alt Hacker News

Replies