Very interesting release: * Hybrid MoE: 2-3x faster than pure MoE transformers * 1M context leng...

red2awn • last Monday at 10:23 PM • 1 reply • view on HN

Very interesting release:

* Hybrid MoE: 2-3x faster than pure MoE transformers

* 1M context length

* Trained on NVFP4

* Open Source! Pretraining, mid-training, SFT and RL dataset released (SFT HF link is 404...)

* Open model training recipe (coming soon)

Really appreciate Nvidia being the most open lab but they really should make sure all the links/data are available on day 0.

Also interesting that the model is trained in NVFP4 but the inference weights are FP8.

Replies

bcatanzaro • yesterday at 3:06 PM

The Nano model isn’t pretrained in FP4, only Super and Ultra are. And posttraining is not in FP4, so the posttrained weights of these models are not native FP4.

alt Hacker News

Replies