I think I see your point, but how would you want to include localized numbers, such as 1,024 in a stream? Would you assume all 0x123 numbers are hex, as that is a common norm? Does the tokenizer already know to read scientific numbers? 1e2, as an example?
That is all to say that numbers in text are already surprisingly flexible. The point of taking the tokens is to let the model lean the flexibility. It is the same reason that we don't tokenize at the word level. Or try to get a soundex normalization. All of these are probably worth at least trying. May even do better in some contexts? The general framework has a reason to be, though.