logoalt Hacker News

red2awnlast Monday at 10:23 PM1 replyview on HN

Very interesting release:

* Hybrid MoE: 2-3x faster than pure MoE transformers

* 1M context length

* Trained on NVFP4

* Open Source! Pretraining, mid-training, SFT and RL dataset released (SFT HF link is 404...)

* Open model training recipe (coming soon)

Really appreciate Nvidia being the most open lab but they really should make sure all the links/data are available on day 0.

Also interesting that the model is trained in NVFP4 but the inference weights are FP8.


Replies

bcatanzaroyesterday at 3:06 PM

The Nano model isn’t pretrained in FP4, only Super and Ultra are. And posttraining is not in FP4, so the posttrained weights of these models are not native FP4.