> Nobody who is doing this is willing to come clean with hard numbers but there are data points, for example from Meta and (very unofficially) Google.
The Meta link does not support the point. It's actually implying a MTBF of over 5 years at 90% utilizization even if you assume there's no bathtub curve. Pretty sure that lines up with the depreciation period.
The Google link is even worse. It links to https://www.tomshardware.com/pc-components/gpus/datacenter-g...
That article makes a big claim, does not link to any source. It vaguely describes the source, but nobody who was actually in that role would describe themselves as the "GenAI principal architect at Alphabet". Like, those are not the words they would use. It would also be pointless to try to stay anonymous if that really were your title.
It looks like the ultimate source of the quote is this Twitter screenshot of an unnamed article (whose text can't be found with search engines): https://x.com/techfund1/status/1849031571421983140
That is not merely an unofficial source. That is just made up trash that the blog author just lapped up despite its obviously unreliable nature, since it confirmed his beliefs.
> It's actually implying a MTBF of over 5 years [...] Pretty sure that lines up with the depreciation period.
You're assuming this is normal, for the MTBF to line up with the depreciation schedule. But the MTBF of data center hardware is usually quite a bit longer than the depreciation schedule right? If I recall correctly, for servers it's typically double or triple, roughly. Maybe less for GPUs, I'm not directly familiar, but a quick web search suggests these periods shouldn't line up for GPUs either.
On top of that, Google isn't using NVIDIA GPUs, they have their own TPU.
Besides, if the claim about GPU wear-and-tear was true, this would show up consistently in GPUs sourced from cryptomining (which was generally done in makeshift compute centers with terrible cooling and other environmental factors) and it just doesn't.