Disk Failure Rates Nowhere Near Manufacturer Claims:


Its a study by some Google employees on the failure rates of disk drives deployed within their infrastructure.
pdf paper (http://labs.google.com/papers/disk_failures.pdf)

Examining the population of hard drives under deployment within Google

In general, there are around four ways a HDD will fail.

Firmware zone corruption, electronic failure, mechanical failure, and logical

Unfortunately, S.M.A.R.T only handles a subset of the "mechanical failure" category (mainly media failure and thermal
failure), and does not protect against single instance/incident catastrophic mechanical errors (head crash, spindle/servo motor
failure, stick-shun).

There was a lack of correlation between elevated temperature and expected drive reliability.
The graph only went up to 50 C, though, and I would expect a much stronger
correlation once the temperature reached beyond typical max temp specs.


* logical corruption may lead to data loss, but it does not necessarily mean that the HDD has failed.

* Also, after a first scan error, drives are 39x more likely to fail within 60 days. First errors in reallocations, offline reallocations, and prob ational counts are also strongly correlated to higher failure probabilities.

* A large fraction of our failed drives have shown NO SMART error signals watsoever.

* There was a lack of a consistent pattern of higher failure rates for higher temperature drdives for for those drives at higher utilization levels.

-> Some of the charts point to drive failure early on, and "survival of the fittest" and beyond 1-6 months not being a factor.