If they were memory bandwidth bound wouldn't that in itself push the wattage and thermals down comparatively, even on a "pegged to 100%" workload? That's the very clear pattern on CPU at least.
That's my experience as well, after monitoring frequency and temp on lots of kernel on all the spectrum from memory-bound, to L2-bound to compute-bound. Hard to reach the 600W with memory-bound kernel. TensorRT manages it somehow with some small to mid networks but perf increase seems capped around 10% too even with all the magic inside.
That's my experience as well, after monitoring frequency and temp on lots of kernel on all the spectrum from memory-bound, to L2-bound to compute-bound. Hard to reach the 600W with memory-bound kernel. TensorRT manages it somehow with some small to mid networks but perf increase seems capped around 10% too even with all the magic inside.