logoalt Hacker News

mrguyoramayesterday at 10:28 PM1 replyview on HN

People I think are overindexing on this being about "Bad hardware".

We have long known that single bit errors in RAM are basically "normal" in terms of modern computers. Google did this research in 2009 to quantify the number of error events in commodity DRAM https://static.googleusercontent.com/media/research.google.c...

They found 25,000 to 70,000 errors per billion device hours per Mbit and more than 8% of DIMMs affected by errors per year.

At the time, they did not see an increase in this rate in "new" RAM technologies, which I think is DDR3 at that time. I wonder if there has been any change since then.

A few years ago, I changed from putting my computer to sleep every night, to shutting it down every night. I boot it fresh every day, and the improvements are dramatic. RAM errors will accumulate if you simply put your computer to sleep regularly.


Replies

jmalickitoday at 12:46 AM

There is DRAM which is mildly defective but got past QC.

There are power suppliers that are mildly defective but got past QC.

There are server designs where the memory is exposed to EMI and voltage differences that push it to violate ever more slightly that push it past QC.

Hardware isn't "good" or "bad", almost all chips produced probably have undetected mild defects.

There are a ton of causes for bitflips other than cosmic rays.

For instance, that specific google paper you cited found a 3x increase in bitflips as datacenter temperature increased! How confident are you the average Firefox user's computer is as temperature-controlled as a google DC?

It also found significantly higher rates as RAM ages! There are a ton of physical properties that can cause this, especially when running 24/7 at high temperatures.