logoalt Hacker News

colordropsyesterday at 9:04 PM2 repliesview on HN

I assume they didn't fix the memory bandwidth pain point though.


Replies

llm_nerdyesterday at 9:10 PM

The memory bandwidth limitation is baked into the GB10, and every vendor is going to be very similar there.

I'm really curious to see how things shift when the M5 Ultra with "tensor" matmul functionality in the GPU cores rolls out. This should be a multiples speed up of that platform.

show 2 replies
cat_plus_plusyesterday at 11:32 PM

At least for transformers, it can be kind of fixed with MOE + NVFP4 for small working set despite large resident size.