> are they fairly uniformly similar with gains in one or a few areas, or is it noisier with a lower overall loss?
It seems like you want to know what median, 5-95 or 1-99 differences might be? I also wonder how the "residual" plot looks like... If there are too many residual data points for a scatter plot then a histogram might be useful to visualize the modes. I suspect that as loss decreases multiple modes should condense or altogether collapse into one.