Production Expert

View Original

Backblaze Publish Their Hard Drive Reliability Figures

We often get asked which hard drives are the most reliable, but it is hard to give a valid answer because most of us only have a handful of drives and so our experiences, although important to us are not statistically sound. Backblaze, on the other hand, are in the person and business backup business and so they have data centre full of hard drives and they keep statistics on when a drive fails, and they have chosen to publish this data.

As of the end of Q3 2015, there were 49,056 hard drives spread across 26 different models, varying from 1.0TB to 8.0TB in size, excluding boot drives, drive models with less than 45 drives and drives in testing systems. They are publishing data on their 1TB drives for the first time. They have also included “Average Drive Age” for each model and summarise the data by manufacturer size as well.

At the bottom of this article is the data in a table which you can study at your leisure or if the table has too much to take in all at once, you can download a ZIP file that when unzipped contains a Microsoft Excel file of the data from the chart. From that spreadsheet you can study the data and extract the data to suit you. But lets look at some observations based on the data....

  • In the chart above are the failure stats for all of the drives in the review, broken down by year. they have calculated that the average failure rate for all periods for all drives is 4.81%.
  • The Western Digital 1TB drives in use are nearly 6 years old on average. There are several drives with nearly 7 years of service. It wasn’t until 2015 that the failure rate rose above the annual average for all drives. This makes sense given the “bathtub” curve of drive failure where drives over 4 years start to fail at a higher rate. Still the WD 1TB drives have performed well for a long time.
  • Nearly all of the 1TB and 1.5TB drives were installed in Storage Pod 1.0 chassis. Yet, these two sizes have very different failure rates.
  • Nearly all of the 2TB and 3TB drives were installed in 2.0 chassis. Yet, these two drive sizes have very different failure rates.
  • Always consider the number of drives (Max # in Service) when looking at the failure rate. For example, the 1.5TB Seagate Barracuda Green drive has a failure rate of 130.9%, but that is based on only 51 drives. They tested these Seagate drives in one Storage Pod in their environment and they were not a good fit. In general, they found it takes at least 6 Storage Pods (270 drives) worth of drives to get good sense of how a given drive will perform in their environment.
  • 4TB drives, regardless of their manufacturer, are performing well. The 2.10% overall failure rate means that over the course of a year, they had to replace only one drive in a Storage Pod filled with these drives. In other words, on average, a pod comes down for maintenance once a year due to drive failure. The math: 2% is 1 out of 50. There are 45 drives in a pod, so about once a year, one of those 45 drives, on average, will fail. Yes, the math is approximate, but you get the idea.
  • 6TB drives, especially the Seagate drives, are also performing well, on par with the 4TB drives so far. The 6TB drives give us 270TB Storage Pods, giving us 50% more storage at the same overall cost per GB.
  • The 5TB and 8TB drives are performing well, but we only have 45 of each in testing, not enough to feel confident in the numbers yet as can be seen in the confidence interval (low rate/high rate) of these drives.

Thank You Backblaze

This data is ready helpful to those of us with just a few drives as we have no way of telling which drives failing was just bad luck and which were a less reliable drive. The size of this data pol and the variety pf drives, even down to them saying 51 drives of one make and model may not be a reliable result, helps us to see that we can look at this data with confidence, so if you are looking at a new drive or you are concerned whether the drives you have are going to let you down, then you might find this data useful.

The Data In More Detail

There’s a lot going on in the chart above, here are a few things to help explain it a little more...

  • The 2013, 2014, and 2015 failure rates are cumulative for the given year. In the case of 2015 that is through Q3 (September).
  • If the failure rate is listed as 0.00% there were drives in use, but none of the drives failed during that period.
  • If the failure rate is blank, there were no drives in use during that period.
  • The “All Periods” failure rates are cumulative for all data (2013-Q3 2015).
  • The “Max # in Service” column is the maximum number of drives ever in service for the given hard drive model.
  • The “Avg Age (Months)” column is the average age of all the hard drives of the given hard drive model. This is based on SMART 9 data.
  • If the “Avg Age (Months)” data is 0.0, the given drive model was not in service during 2015 making the value difficult to compute. (We’ll try to figure out a better way to compute this value by the next report.)
  • The HGST (*) model name – we’ve been asked to use HGST in place of Hitachi and we are honoring that request, but these drives report their model as Hitachi and are listed as such in the data files.
  • The Low Rate and High Rate are the boundaries for the confidence interval for the failure rate listed.
See this gallery in the original post