i think that log-sum-exp should actually be the function that gets the name "softmax" because its actually a soft maximum over a set of values. And what we call "softmax" should be called "grad softmax" (since grad of logsumexp is softmax).
softmax is badly named and should rather be called softargmax.
softmax is badly named and should rather be called softargmax.