Surprise, hard drives don’t fail because of heat or load (says Google).
Jed | September 9, 2008Google probably has more hard drives running at any one time than anyone else out there. Well they’re at least in the top 10. So it makes sense they have a ton of experience with hard drives failing (aka dying). In typical Google fashion, they decided to actually use this information and put it to good use.
Google has released a research paper entitled “Failure Trends in a Large Disk Drive Population” (warning: pdf). They analyzed the failure rates of their hard drives which they describe as “high-volume, consumer-grade disk drives” (i.e. the same type of ones you or I would be likely to buy when we need a new drive). What they found goes against the common wisdom in the industry, namely that hard drives tend to fail more when subjected to high temperatures or long periods of heavy activity.
One of our key findings has been the lack of a consistent pattern of higher failure rates for higher temperature drives or for those drives at higher utilization levels. Such correlations have been repeatedly highlighted by previous studies, but we are unable to confirm them by observing our population. Although our data do not allow us to conclude that there is no such correlation, it provides strong evidence to suggest that other effects may be more prominent in affecting disk drive reliability in the context of a professionally managed data center deployment.
Google was careful not to highlight any one drive manufacturer as having higher failure rates than others. But they do admitt that their results confirm that some models, manufacturers and “vintages” sometimes turn out to be lemons and have high failure rates. But just because one drive model from one manufacturer is crappy, that doesn’t mean a different drive from the same manufacturer is bad as well.
They recommend paying attention to the SMART monitor built into most hard drives and suggest that users run periodic scans of their hard drives. After all, they found that once an error is found by a diagnostic scan, that drive is 39 times more likely to fail within 2 months than a drive with no errors. So run those tests and if you find an error, you’ll probably want to make sure your stuff is backed up. You ARE backing up all your data, right?
For new purchases, if you’re looking at a bleeding edge, brand new drive, you won’t have much historical data to go on and will just have to hope for the best. But if you’re in the market for a mid-level drive, make sure to check reviews and user comments to see if a very high number of people complain about drive failures.





