logoalt Hacker News

danielmarkbruce01/21/20251 replyview on HN

Yep, if by chance you hit a model that has seen the training data that happens to shove those tokens together in a way that it can guess, lucky you.

The point is, it would be trivial for an LLM to get it right all the time with character level tokenization. The reason LLMs using the current tokenization best tradeoff find this activity difficult is that the tokens that make up tree don't include the token for e.


Replies

HarHarVeryFunny01/21/2025

No - you can give the LLM a list of letters and it STILL won't be able to count them reliably, so you are guessing wrong about where the difficult lies.

Try asking Claude: how many 'r's are in this list (just give me a number as your response, nothing else) : s t r a w b e r r y

show 1 reply