Did they start correctly counting the number of 'R's in 'strawberry'?

hansmayer • 02/20/2025 • 3 replies • view on HN

Replies

SkiFire13 • 02/20/2025

Most likely yes, that prompt has been repeated too many times online for LLMs not to pick up the right answer (or be specificlly trained on it!). You'll have to try with a different word to make them fail.

➕ show 1 reply

pulvinar • 02/20/2025

Not as long as they use tokens -- it's a perception limitation of theirs. Like our blind spot, or the Muller-Lyer illusion, or the McGurk effect, etc.

comeonbro • 02/20/2025

Imagine if I asked you how many '⊚'s are in 'Ⰹ⧏⏃'? (the answer is 3, because there is 1 ⊚ in Ⰹ and 2 ⊚s in ⏃)

Much harder question than if I asked you how many '⟕'s are in 'Ⓕ⟕⥒⟲⾵⟕⟕⢼' (the answer is 3, because there are 3 ⟕s there)

You'd need to read through like 100,000x more random internet text to infer that there is 1 ⊚ in Ⰹ and 2 ⊚s in ⏃ (when this is not something that people ever explicitly talk about), than you would need to to figure out that there are 3 ⟕s when 3 ⟕s appear, or to figure out from context clues that Ⰹ⧏⏃s are red and edible.

The former is how tokenization makes 'strawberry' look to LLMs: https://i.imgur.com/IggjwEK.png

It's a consequence of an engineering tradeoff, not a demonstration of a fundamental limitation.

➕ show 1 reply

alt Hacker News

Replies