> When understood in this way, it is obvious why LLMs are bad at counting r's in "straw...

8note • today at 6:27 PM • 1 reply • view on HN

> When understood in this way, it is obvious why LLMs are bad at counting r's in "strawberries".

no it doesnt. it makes sense that they cant count the rs because they dont have access to the actual word, only tokens that might represent parts or the whole of the word

Replies

orbital-decay • today at 6:59 PM

Tokenization is a simplistic explanation which is likely wrong, at least in part. They're perfectly fine reciting words character by character, using different tokenization strategies for the same word if forced to (e.g. replacing the starting space or breaking words up into basic character tokens), complex word formation in languages that heavily depend on it, etc. LLMs work with concepts rather than tokens.

alt Hacker News

Replies