I mean the easiest explanation would be that the model harness doesn't always take the most lik...

valzam • today at 5:59 AM • 1 reply • view on HN

I mean the easiest explanation would be that the model harness doesn't always take the most likely token but does top-k sampling or similar. temperatur just means that probabilities get more and more equalized, boosting the chance that an unlikely token gets picked. but even with temp 0 you could have 0.8 T1, 0.19 T2, ... and sometimes sample T2

Replies

aesthesia • today at 6:07 AM

No, this can't happen at temperature 0. The formula defining temperature-adjusted softmax isn't strictly defined at 0, but taking the limit (in the case where all logits are distinct) results in probability 1 being placed on the largest logit. Samplers will typically special case temperature 0 and pick the most likely token at each step.

➕ show 1 reply

alt Hacker News

Replies