This doesn't match my experience, in academia I saw ~40-45% utilization NVIDIA GPU clusters that went 6 years with <20% failure rate. Might be a TPU thing?
I'm FAR form an expert on this, but I believe that the operating costs such as power + cooling form a big part of the lifecycle. I have no doubt that at some point within the 6 years that are being booked, that replacing entire working racks won't be more cost efficient.
I'm FAR form an expert on this, but I believe that the operating costs such as power + cooling form a big part of the lifecycle. I have no doubt that at some point within the 6 years that are being booked, that replacing entire working racks won't be more cost efficient.