We're getting to the limit of my understanding, but I believe most Blackwell users still usually run FP8 passes through the transformer engine - they'll just store weights at NVFP4. Nvidia has model-specific stabilization recipes for NVFP4 end to end, but they're taking fixes all the time.
Nvidia says Rubin should have fewer stability problems training with FP4 because of hardware changes - "adaptive compression". There will still be outlier instability inherently, but something they're designing in reduces the cost of managing it.
But yeah, grain of salt - we haven't seen this in practice.