Temperature can't be literally zero, or it creates a divide by zero error.
When people say zero, it is shorthand for “as deterministic as this system allows”, but it's still not completely deterministic.
Zero temp just uses argmax, which is what softmax approaches if you take the limit of T to zero anyway. So it could very well be deterministic.
Zero temp just uses argmax, which is what softmax approaches if you take the limit of T to zero anyway. So it could very well be deterministic.