I wonder if the reason the models have problem with this is that their tokens aren't the same as our characters. It's like asking someone who can speak English (but doesn't know how to read) how many R's are there in strawberry. They are fluent in English audio tokens, but not written tokens.
Agree. We've given them a different alphabet than ours.
They speak a different language that captures the same meaning, but has different units.
Somehow they need to learn that their unit of thought is not the same as our speech. So that these questions need to map to a different alphabet.
That's my two cents.
The amazing thing continues to be that they can ever answer these questions correctly.
It's very easy to write a paper in the style of "it is impossible for a bee to fly" for LLMs and spelling. The incompleteness of our understanding of these systems is astonishing.
Yeah that’s my understanding of the root cause. It can also cause weirdness with numbers because they aren’t tokenized one digit at a time. For good reason, but it still causes some unexpected issues.
The way LLMs get it right by counting the letters then change their answer at the last second makes me feel like there might be a large amount of text somewhere (eg. a reddit thread) in the dataset that repeats over and over that there is the wrong number of Rs. We've seen may weird glitches like this before (eg. a specific reddit username that would crash chatgpt)