logoalt Hacker News

soVeryTiredlast Friday at 9:13 PM0 repliesview on HN

No way is vocab size zipfian. Word counts from a corpus follow zipf's law, but not vocab sizes themselves.

Otherwise the most common vocab size would be equal to one.