logoalt Hacker News

boroboro4today at 3:35 AM1 replyview on HN

It's very unclear what's special in Rubin to be optimized for inference? I can see disaggregated bit (with having separate prefill and decoding nodes), but what else?


Replies

villgaxtoday at 4:48 AM

Lot more SMs & Tensor Cores for NVFP4 going by the looks of it.