logoalt Hacker News

nobodywillobsrvlast Thursday at 7:12 AM2 repliesview on HN

Softmax’s exponential comes from counting occupation states. Maximize the ways to arrange things with logits as energies, and you get exp(logits) over a partition function, pure Boltzmann style. It’s optimal because it’s how probability naturally piles up.


Replies

efavdblast Thursday at 12:32 PM

I personally don’t think much of the maximum entropy principle. If you look at the axioms that inform it, they don’t really seem obviously correct. Further, the usual qualitative argument is only right in a certain lens: namely they say choosing anything else would require you to make more assumptions about your distribution than is required. Yet it’s easy to find examples where the max entropy solution suppresses some states more than is necessary etc., which to me contradicts that qualitative argument.

semiinfinitelylast Thursday at 7:16 AM

right and it should be totally obvious that we would choose an energy function from statistical mechanics to train our hotdog-or-not classifier

show 3 replies