logoalt Hacker News

kelseyfroglast Friday at 11:52 PM1 replyview on HN

The die is certainly not multi-terabyte. A more realistic number would be 32k-sided to 50k-sided if we want to go with a pretty average token vocabulary size.

Really, it comes down to encoding. Arbitrarily short utf-8 encoded strings can be generated using a coin flip.


Replies

Dylan16807yesterday at 12:55 AM

The number of sides has nothing to do with the data within. It's not random and sometimes it repeats things in an obviously non-chance way.

show 1 reply