logoalt Hacker News

jeffbeeyesterday at 5:06 PM1 replyview on HN

I am not taking Google's result at face value, but the article shouldn't make assumptions without supporting evidence, either. ASHRAE used to say your datacenter should be 20º-25º which you know makes a certain amount of sense when it comes from an organization earning its money from installing and repairing CRACs. Now they admit that 18º-27º is common and they allow for up to 45º ambient designs. They are following the industry up.


Replies

adrian_btoday at 5:38 AM

"Higher temperatures" must be qualified in any statement like this.

There is absolutely no doubt that with increasing temperature the rate of failures for any semiconductor device increases very quickly. This is routinely tested by any manufacturer.

What happens is that at low enough temperatures the rate of failures caused by temperature may be small in comparison with that for failures caused by other reasons so you will see no temperature effect. However, once you raise the temperature enough, you will see an obvious dependence of temperature for the rate of failures.

Semiconductor devices are designed so that their rate of failures for a crystal temperature specified in their datasheet, usually in the range of 90 to 110 degrees Celsius, is low enough so that most devices will have a life of at least 10 years or other such value.

Which is the ambient temperature at which the nominal maximum temperature is reached depends on the cooling and on the power consumption.

If the device has a temperature that exceeds the nominal maximum temperature, it is pretty certain that you will see a strong dependence on temperature of the failure rate.

Whether you also see temperature effects at lower crystal temperatures, e.g. around 60 degrees Celsius, depends on the device and it is unpredictable unless you do a costly experiment yourself.

In general, it is expected that for low-quality devices you will not see temperature effects, because those will fail for other reasons, while for high-quality devices, which lack manufacturing defects, you will see a temperature dependence for the failure rate even at lower temperatures.

So Google might have not seen temperature effects because they were using the cheapest junk anyway.