While the theoretical bottleneck is there, it is far less restrictive than what you are describing, because the number of almost orthogonal vectors grows exponentially with ambient dimensionality. And orthogonality is what matters to differentiate between different vectors: since any distribution can be expressed as a mixture of Gaussians, the number of separate concepts that you can encode with such a mixture also grows exponentially
I agree that you can encode any single concept and that the encoding space of a single top pick grows exponentially.
However, I'm talking about the probability distribution of tokens.