We're getting to the limit of my understanding, but I believe most Blackwell users still usuall...

Schiendelman • yesterday at 4:17 AM • 0 replies • view on HN

We're getting to the limit of my understanding, but I believe most Blackwell users still usually run FP8 passes through the transformer engine - they'll just store weights at NVFP4. Nvidia has model-specific stabilization recipes for NVFP4 end to end, but they're taking fixes all the time.

Nvidia says Rubin should have fewer stability problems training with FP4 because of hardware changes - "adaptive compression". There will still be outlier instability inherently, but something they're designing in reduces the cost of managing it.

But yeah, grain of salt - we haven't seen this in practice.

alt Hacker News