I think it's more likely to be the old base model checkpoint further trained on additional data.
Is that technically not a new pretrained model?
(Also not sure how that would work, but maybe I’ve missed a paper or two!)
Is that technically not a new pretrained model?
(Also not sure how that would work, but maybe I’ve missed a paper or two!)