Oh it's good old tokenization vs for-LLM tokenizations like sentence piece or tiktoken. We shouldn't forget there are non-ML simple things like this one which doesn't ask you to buy more GPUs.
Haha, I like “good old tokenization”
Haha, I like “good old tokenization”