logoalt Hacker News

Schiendelmantoday at 12:52 AM2 repliesview on HN

I'm not surprised to see competition with Blackwell. Rubin is 5x faster than Blackwell at inference - Blackwell is the last generation Nvidia didn't optimize specifically for inference.

If I'm missing something, please let me know!


Replies

boroboro4today at 3:35 AM

It's very unclear what's special in Rubin to be optimized for inference? I can see disaggregated bit (with having separate prefill and decoding nodes), but what else?

show 1 reply
nullctoday at 1:15 AM

how do you get 5x faster at inference when inference is memory bandwidth limited? getting 5x the memory bandwidth of a h100 seems physically difficult.

show 2 replies