> Look at the softmax function and take the limit as T->0. It becomes a dirac-delta function.

317070 • today at 9:51 AM • 1 reply • view on HN

In pure math, it does not always do that. It becomes a dirac-delta comb with equal weight on every maximum. There can be more than 1 maximum. Setting the temperature to zero turns into greedy sampling, but greedy sampling is not necessarily deterministic as you can have multiple equally optimal options.

Replies

sigmoid10 • today at 10:44 AM

That is not a problem for LLMs, because in practice floating point inaccuracies (in particular after exponentiation) prevent values from being exactly equal. That's why greedy sampling generally produces deterministic output for LLMs. The real gotchas are elsewhere (like with batch inference as we've seen with earlier GPTs). But unlike what the earlier comment says, this is a non-issue mathematically.

➕ show 2 replies

alt Hacker News

Replies