It's very unclear what's special in Rubin to be optimized for inference? I can see disaggregated bit (with having separate prefill and decoding nodes), but what else?
Lot more SMs & Tensor Cores for NVFP4 going by the looks of it.
Lot more SMs & Tensor Cores for NVFP4 going by the looks of it.