This is a very tiring criticism. Yes, this is true. But, it's an implementation detail (tokenization) that has very little bearing on the practical utility of these tools. How often are you relying on LLM's to count letters in words?
It's an example that shows that if these models aren't trained in a specific problem, they may have a hard time solving it for you.
At this point if I was openAI I wouldn’t bother fixing this to give pedants something to get excited about.
The criticism would stop if the implementation issue was fixed.
It's an example of a simple task. How often are you relying on LLMs to complete simple tasks?
The implementation detail is that we keep finding them! After this, it couldn't locate a seahorse emoji without freaking out. At some point we need to have a test: there are two drinks before you. One is water, the other is whatever the LLM thought you might like to drink after it completed refactoring the codebase. Choose wisely.