logoalt Hacker News

Lerctoday at 12:50 AM1 replyview on HN

When training lots of models with subtly different parameters like this, Is there anything to be learned from the differences in logprobs between them for the same input. Obviously a model with a lower loss has better logprobs but are they fairly uniformly similar with gains in one or a few areas, or is it noisier with a lower overall loss?


Replies

itissidtoday at 1:01 AM

> are they fairly uniformly similar with gains in one or a few areas, or is it noisier with a lower overall loss?

It seems like you want to know what median, 5-95 or 1-99 differences might be? I also wonder how the "residual" plot looks like... If there are too many residual data points for a scatter plot then a histogram might be useful to visualize the modes. I suspect that as loss decreases multiple modes should condense or altogether collapse into one.