logoalt Hacker News

minimaxirlast Thursday at 8:58 PM2 repliesview on HN

> Note: we are not releasing any post-trained / IT checkpoints.

I get not trying to cannibalize Gemma, but that's weird. A 540M multimodel model that performs well on queries would be useful and "just post-train it yourself" is not always an option.


Replies

jeffjeffbearlast Thursday at 9:11 PM

Isn't finetuning the point of the T5 style models, since they perform better for smaller parameter counts?

show 1 reply
sundarurfriendyesterday at 3:37 AM

This made me compare the figures, and: did they accidentally switch those around, or are the Post-training Reasoning and Factuality scores actually significantly lower than the Pre-training ones?

Edit: Just noticed

> Also note pre-training and post-training benchmarks are different, so scores are not comparable across plots.

The paper gives more details about the specific benchmarks and the scores obtained in them: https://arxiv.org/html/2512.14856v1#S4