logoalt Hacker News

CuriouslyClast Tuesday at 5:10 AM1 replyview on HN

This doesn't match my experience, in academia I saw ~40-45% utilization NVIDIA GPU clusters that went 6 years with <20% failure rate. Might be a TPU thing?


Replies

CraigRoodlast Tuesday at 2:28 PM

I'm FAR form an expert on this, but I believe that the operating costs such as power + cooling form a big part of the lifecycle. I have no doubt that at some point within the 6 years that are being booked, that replacing entire working racks won't be more cost efficient.

show 1 reply