I don't have first-hand knowledge on HBM GPUs but on the RTX Blackwell 6000 Pro Server, the per...

touisteur • today at 7:03 AM • 2 replies • view on HN

I don't have first-hand knowledge on HBM GPUs but on the RTX Blackwell 6000 Pro Server, the perf difference between the free up-to-600W and the same GPU capped at 300W is less than 10% on any workload I could (including Tensor Core-heavy ones) throw at it.

That's a very expensive 300W and I wonder what tradeoff made them go for this, and whether capping is here a way to increase reliability. ...

Wonder whether there's any writeup on those additional 300 Watts...

Replies

zozbot234 • today at 8:15 AM

> whether capping is here a way to increase reliability

Almost certainly so, and you wouldn't even need to halve the wattage; even a smaller drop ought to bring a very clear improvement. The performance profile you mention is something you see all the time on CPUs when pushed to their extremes; it's crazy to see that pro-level GPUs are seemingly being tuned the same way out of the box.

storystarling • today at 7:52 AM

It sounds like those workloads are memory bandwidth bound. In my experience with generative models, the compute units end up waiting on VRAM throughput, so throwing more wattage at the cores hits diminishing returns very quickly.

➕ show 2 replies

alt Hacker News

Replies