A lot of the TDP is reserved for running the shader units at full-power. My RTX 3070 Ti only pulls ~...

bigyabai • yesterday at 9:28 PM • 2 replies • view on HN

A lot of the TDP is reserved for running the shader units at full-power. My RTX 3070 Ti only pulls ~110w of it's 320w running CUDA inference on Gemma 26b and E4B.

Replies

Scaevolus • yesterday at 9:35 PM

It's not that it's reserving power, but rather that you hit some bottleneck on a 3070 Ti before running into thermal limits-- it's likely limited by either tensor core saturation or RAM throughput. Running the workload with Nvidia's profiling tools should make the bottleneck obvious.

➕ show 1 reply

gambiting • yesterday at 11:03 PM

My 5090 runs at full TDP(pretty much exactly 575W) when running inference through LM Studio.

➕ show 1 reply

alt Hacker News

Replies