I'm kind of stunned that someone is using my work to tell me I'm wrong. I wrote the code for the dish brain pong and encoding information was a huge part of what that experiment was about.
So when I way that the grok paper and the pong paper fundamentally agree I have some idea of what I'm talking about.
If you're going to claim the tokenizer is a dictionary then it doesn't really matter what paper you wrote code for.
Hubris much? I don't see a necessary contradiction in using someone's work to disprove another aspect of that same person's work.
I might have misunderstood the point you are making. I read the original article as "weights are like meat", and so I'm confused by what you consider fractally wrong.