The die is certainly not multi-terabyte. A more realistic number would be 32k-sided to 50k-sided if we want to go with a pretty average token vocabulary size.
Really, it comes down to encoding. Arbitrarily short utf-8 encoded strings can be generated using a coin flip.
The number of sides has nothing to do with the data within. It's not random and sometimes it repeats things in an obviously non-chance way.