I assume they didn't fix the memory bandwidth pain point though.

colordrops • yesterday at 9:04 PM • 2 replies • view on HN

Replies

The memory bandwidth limitation is baked into the GB10, and every vendor is going to be very similar there.

I'm really curious to see how things shift when the M5 Ultra with "tensor" matmul functionality in the GPU cores rolls out. This should be a multiples speed up of that platform.

➕ show 2 replies

cat_plus_plus • yesterday at 11:32 PM

At least for transformers, it can be kind of fixed with MOE + NVFP4 for small working set despite large resident size.

alt Hacker News

Replies