I'll bet said phds can't answer the equivalent question in a language they don't unde...

danielmarkbruce • 01/20/2025 • 5 replies • view on HN

I'll bet said phds can't answer the equivalent question in a language they don't understand. LLMs don't speak character level english. LLMs are, in some stretched meaning of the word, illiterate.

If LLMs used character level tokenization it would work just fine. But we don't do that and accept the trade off. It's only folks who have absolutely no idea how LLMs work that find the strawberry thing meaningful.

Replies

wat10000 • 01/20/2025

I’ll bet said PhDs will tell you they don’t know instead of confidently stating the wrong answer in this case. Getting LLMs to express an appropriate level of confidence in their output remains a major problem.

sdesol • 01/20/2025

> It's only folks who have absolutely no idea how LLMs work that find the strawberry thing meaningful.

I think it is meaningful in that it highlights how we need to approach things a bit differently. For example, instead of asking "How many r's in strawberry?", we say "How many r's in strawberry? Show each character in an ordered list before counting. When counting, list the position in the ordered list." If we do this, every model that I asked got it right.

https://beta.gitsense.com/?chat=167c0a09-3821-40c3-8b0b-8422...

There are quirks we need to better understand and I would say the strawberry is one of them.

Edit: I should add that getting LLMs to count things might not be the best way to go about it. Having it generate code to count things would probably make more sense.

➕ show 3 replies

HarHarVeryFunny • 01/20/2025

I don't think that (sub-word) tokenization is the main difficulty. Not sure which models still fail the "strawberry" test, but I'd bet they can at least spell strawberry if you ask, indicating that breaking the word into letters is not the problem.

The real issue is that you're asking a prediction engine (with no working memory or internal iteration) to solve an algorithmic task. Of course you can prompt it to "think step by step" to get around these limitations, and if necessary suggest an approach (or ask it to think of one?) to help it keep track of it's letter by letter progress through the task.

➕ show 1 reply

michaelt • 01/20/2025

You say that very confidently - but why shouldn't an LLM have learned a character-level understanding of tokens?

LLMs would perform very badly on tasks like checking documents for spelling errors, processing OCRed documents, pluralising, changing tenses and handling typos in messages from users if they didn't have a character-level understanding.

It's only folks who have absolutely no idea how LLMs work that would think this task presents any difficulty whatsoever for a PhD-level superintelligence :)

➕ show 2 replies

throwaway2037 • 01/21/2025

    > LLMs are, in some stretched meaning of the word, illiterate.

You raise an interesting point here. How would LLMs need to change for you to call them literate? As a thought experiment, I can take a photograph of a newspaper article, then ask a LLM to summarise it for me. (Here, I assume that LLMs can do OCR.) Does that count?

➕ show 1 reply

alt Hacker News

Replies