The die is certainly not multi-terabyte. A more realistic number would be 32k-sided to 50k-sided if we want to go with a pretty average token vocabulary size.
Really, it comes down to encoding. Arbitrarily short utf-8 encoded strings can be generated using a coin flip.
The die is certainly not multi-terabyte. A more realistic number would be 32k-sided to 50k-sided if we want to go with a pretty average token vocabulary size.
Really, it comes down to encoding. Arbitrarily short utf-8 encoded strings can be generated using a coin flip.