Token selection is based off normalization, even if you train a model to produce outlier answers, even in that process you are biasing to a subset of outliers, which is inherently normalizing.
Could you elaborate on "token selection is based off normalization"?
Could you elaborate on "token selection is based off normalization"?