logoalt Hacker News

yorwbayesterday at 6:23 AM1 replyview on HN

No. You can fit an exponential number of almost-orthogonal vectors into the input space, but the number of not-too-similar probability distributions over output tokens is also exponential in the output dimension. This is fine if you only care about a small subset of distributions (e.g. those that only assign significant probability to at most k tokens), but if you pick any random distribution, it's unlikely to be represented well. Fortunately, this doesn't seem to be much of an issue in practice and people even do top-k sampling intentionally.


Replies

anonymoushnyesterday at 4:17 PM

I see. You're right. I was either badly mistaken or only thinking about small k.