The memory bandwidth limitation is baked into the GB10, and every vendor is going to be very similar...

llm_nerd • yesterday at 9:10 PM • 2 replies • view on HN

The memory bandwidth limitation is baked into the GB10, and every vendor is going to be very similar there.

I'm really curious to see how things shift when the M5 Ultra with "tensor" matmul functionality in the GPU cores rolls out. This should be a multiples speed up of that platform.

Replies

storus • yesterday at 9:18 PM

My guess is M5 Ultra will be like DGX Spark for token prefill and M3 Ultra for token generation, i.e. the best of both worlds, at FP4. Right now you can combine Spark with M3U, the former streaming the compute, lowering TTFT, the latter doing the token generation part; with M5U that should no longer be necessary. However given RAM prices situation I am wondering if M5U will ever get close to the price/performance of Spark + M3U we have right now.

➕ show 1 reply

kristianp • today at 2:12 AM

The M3 ultra was released about 18 months after the original M3, so you could be waiting a while for the M5 Ultra.

alt Hacker News

Replies