It is not often that I will admit NetApp is right in this blog.
The explanation for this amounts to one of, or a combination of, two factors: 1) NetApp doesn't sell a lot of virtual tape, and they don't offer deduplication with their virtual tape, so our paths rarely intersect; 2) when they do intersect, I find their marketing claims tend to have more style than substance. Try dissecting "tape smart sizing" for example. David Blaine would be proud.
However, when somebody is right they are right. And in this case, Alex McDonald at NetApp is right!
So don't take this as a sign of the apocalypse!
What does Alex have to say at "The Missing Shade of Blue" in his post "Ray Day Pay"?
Well he is talking about the advantages of RAID-6 over RAID-5. And he correctly points out that RAID-6 offers greater protection from disk failure than RAID-5, by virtue of the fact that there are two parity blocks per RAID rank. In fact, he notes that:
"The advantages of dual parity are in robust data protection; specifically, RAID-6 can sustain two simultaneous drive failures in any RAID group without loss of data."
Good stuff. We are all on the same page so far Mr. McDonald.
Next, he claims that:
"RAID-5 is dangerous; anyone running RAID-5 on large 1TB drives (and they're getting larger) is running a serious and measurable risk... The likelihood of one drive failing for some reason and, say, a single-bit media error on another during RAID reconstruct has increased to levels that make single-parity systems much more likely to suffer catastrophic data loss in everyday operation."
OK, we are still both in agreement. Here is the kicker:
"Especailly worrying is that these kind of drives are being used for archive, backup and recovery, and DR purposes."
The emphasis is mine.
I wonder if, after all this, Alex would be surprised to find out that NetApp's VTL uses RAID 5?
No, they don't call it RAID-5. How could they, when they think that RAID-5 is "dangerous"?
They call it VTL-RAID. Which may be RAID-3, or RAID-4, or RAID-5. But everything I have seen indicates that it is RAID 5.
(And for our purposes here, it doesn't matter a lot of it is RAID-3, or RAID-4, or RAID-5. They are all single parity implementations, and therefore all have the same level of exposure to a single drive failure. The primary differentiation between them is, or used to be, performance under different workload types--big block, small block, sequential vs. random, and so on.)
But wait, it gets worse. A lot worse.
NetApp also uses a "self tuning" feature on their VTL. Self tuning (in short) ensures that data from a given virtual cartridge is likely to end up on a large number of RAID groups. Essentially, when I write to a disk array, I want to engage as many spindles as possible if I am trying to increase performance. NetApp's approach to this problem is to implement "self tuning". This means that the data from a backup stream--the writes to a virtual cartridge--are "sprayed" across all available LUNs (RAID groups). This is simply another way of describing striping. This is a really good idea when it comes to performance (but not such a great idea for availability if you use RAID-5). I would also note that the strategy is really only effective at addressing single stream performance. When we are dealing with multiple streams (virtual tape drives) there are more effective strategies to improve performance of the appliance.
The problem with this is that if any one of the RAID-5 groups in the VTL fails due to a double disk failure you are almost guaranteed to lose data on most (all?) virtual cartridges associated with the entire appliance. A double disk failure can cause loss of data on all tapes, and therefore has the potential for total data loss.
It is as if a double disk failure could cause a total array failure.
If RAID-5 is "dangerous" in the context of a single 6+1 RAID group, how dangerous is it when you have data striped across 48 such groups?
It doesn't matter if the data is "self describing" or not. If the data is lost, you just lost your backup.
Not just one backup. All of them.
For the record: all virtual tape libraries and flexible infrastructure currently shipping from EMC (the EDL and DL3D product lines) employ RAID-6 for all backup data on disk.