« Deduplication and Replication | Main | Deduplication and Replication Revisited »

October 29, 2008


Feed You can follow this conversation by subscribing to the comment feed for this post.


Wow! You are VERY precise in your deductions. Just seem to lack some intellect on the facts. Were you denied a job at NetApp or maybe lived in a cyber closet for the last several years? Either way your rhetoric reminds me of a politician. Obama, is that you??? So for your list of vendors above, how many of them offer RAID-5 as a preferred or standard RAID with their offerings. How many are using RAID-6 as best practice. (None- due to poor performance unlike when implemented as NetApp RAID-DP) (Just a quick role call here: EMC. Nope. HP. Not even close. IBM. Not even. Sun. wishes they could. Dell. Not even close. Data Domain. Ha! Sepaton. Nope. Falconstor. Do they have disk sub system??? )

Thanks for the useless, misrepresented and un-satisfying facts though! What a very skewed view you have from your back-up mountain top. Get some clarity and present an unbiased postions with a fair representation of all the facts before spewing such worthless non-sense, please!!!

Scott Waterhouse


The facts are I am using NetApp numbers, and probability theory.

So there are really only two options here, without any wiggle room I can see.

Option 1: NetApp is wrong about the probability of failure of RAID 5. If that is the case, perhaps they can stop criticizing everybody else for using and not using RAID 6.

Option 2: If they are not wrong, then show me where I have made a mistake with the probability theory. If the risk of failure really is 6% then the rest just follows logically.

Actually, let me introduce another option: contrary evidence or reasoning that would disprove my approach.

I would be happy to discuss based on any of those three options. Heck, I will even retract if it turns out that I am mistaken. But given that you didn't even attempt to address me on the basis of any of those points, I am going to have to leave things at that...


John "HP. Not even close"

Really you should check out the VLS6000 and VLS9000 line from HP http://h18006.www1.hp.com/storage/disk_storage/disk_to_disk/vls/index.html
Both use Raid 6 and DeDupe. The only one's not sporting Raid 6 are the lower end entry level boxes where price/capacity are key.

Haven't even bothered to look at the other vendors you mentioned, some are gateway products so will rely on the underlying arrays ability, of which most do support Raid 6. But it looks like you need to do some more research before flinging mud indiscriminately.

Scott Waterhouse

Alright, I now have an update on the math. Thanks to an astute reader (Aaron) with a better understanding of probability theory than me, this is what it should be:

Reliability = 1 - (probability of failure of an individual RAID group ^ # of RAID groups). Meaning that a single RAID group system will be 94% reliable over 5 years (only 6% likely to experience catastrophic data loss), and a 2 RAID group system will be 88.36% reliable. The full list, up to 20 RAID groups, is as follows, where the first number is the number of RAID groups, and the second is the probability as a percentage of total failure (total data loss on the NearStore VTL):

1 6.00%
2 11.64%
3 16.94%
4 21.93%
5 26.61%
6 31.01%
7 35.15%
8 39.04%
9 42.70%
10 46.14%
11 49.37%
12 52.41%
13 55.26%
14 57.95%
15 60.47%
16 62.84%
17 65.07%
18 67.17%
19 69.14%
20 70.99%


John, your lack of disk knowledge is painful. I work for SEPATON and can confirm that we ship RAID 6. Most other dedupe vendors do as well with the obvious exception of NetApp.

Scott Waterhouse


I was giving John the benefit of the doubt (and not trying to further antagonize him) and assuming that he was talking about primary storage, and not VTL/backup systems. For backup systems, the following vendors offer RAID 6:

Data Domain

The ones that DO NOT offer RAID 6 are:

Sun (they use ZFS--and I am not about to go into a full analysis of that vs. RAID 5 here!)
Falconstor (because they don't sell hardware)

And the one I don't know for sure:

Dell (not indicated on the specs for the PowerVault DL2000)

W. Curtis Preston

I agree with you that NetApp's blogs (and common sense) dictate that they should be using RAID 6. I also agree that it's disappointing that they released their product without RAID6 or replication. (I’m sure they’re working on it furiously.) It really takes the wind out of the sails of their launch.

I don’t agree with the “last major vendor” comment. EMC, HDS, HP, & Sun are simply putting their own storage behind somebody else’s hard work. That’s hardly the same as building your own target dedupe system from scratch – or (in the case of IBM) having the commitment to acquire the company you’re going to use to do target dedupe.

It’s true that NetApp was the last major OEM disk vendor to ship target dedupe for their VTL, but they were the first major OEM disk storage vendor (i.e. EMC, IBM, HDS, HP, NetApp & Sun) to have their own (not OEMd) VTL and the first vendor of any kind to ship dedupe for primary storage (developed it themselves). Before that, they supported dedupe for NetBackup with that same product. (EMC was the first, and so far the only, such vendor to acquire and ship a source dedupe product.) NetApp is the second such vendor to ship their own (not OEMd) target dedupe product (IBM beat them by a few months via their acquisition of Diligent) , and the first such vendor to develop it themselves. So while they may look like the last one to the party, they’re the first to come with a monogamous wife, versus others who are coming with a friend (EMC/Quantum), or caught in a love triangle (IBM/HDS/Diligent or Sun/COPAN/Falconstor).

Have I given my NetApp contacts serious grief for them not shipping dedupe until now? Of course I have. Am I now giving them more grief for shipping it with RAID5 and no replication? Of course I am. But I also think it’s important to give them props for having the commitment to do it themselves. I think that counts for something, and I just blogged about why:

Scott Waterhouse

I think there is a lot we can agree on here. However, I would challenge your supposition EMC is "simply putting their own storage behind somebody else's hard work." That is very far from the truth. In two ways.

First, there is real, significant incremental value add in terms of customer integration and code work that goes on. My favorite example of this is the Write Cache Coalescing that happens on a DL4000 (Clariion storage, Falconstor code base). This is unique to EMC, and it can make a very significant performance difference. This is just one example of dozens on our EDL line. There absolutely is unique integration going on--we don't just grab the code, stuff it on an Intel server, and call it a day!

Second, there is an issue of quality. When EMC puts its name on something, there is a standard of quality, support for multi-vendor environments, and supportability. From testing, to QA, to bug fixes, product support, and our interoperability lab, there is a vast amount of work that goes into a product release. There is a very good reason why EMC EDL code versions have no relationship to Falconstor VTL code versions! And this applies irrespective of whether it is an internally developed product, or has an OEM component to it. The standards are the same.

So OEM'ing code is not necessarily bad, and doing it yourself is not necessarily good. Your relationship metaphors made me smile however.

W. Curtis Preston

There's no doubt that the 4000 has some unique code to make it work. It's a unique architecture. But I'll judge that system when it ships. When should we expect that?

But other than the testing, the 3000 series is basically Quantum code in an Intel server in front of Clarion storage, right? While I don't want to minimize the amount of effort testing takes, there's a big difference between testing someone else's product and building your own product and testing IT.

You're right. Being homegrown doesn't necessarily make something better. BUT, all other things being equal, I'll take a homegrown solution over an OEMd solution any day.

I also found it amusing that in all the vendors you listed, you didn't list the one that you're using: "(Just a quick role call here: EMC. Check. HP. Yep. IBM. You bet. Sun. Check. Dell. Roger that. Data Domain. Naturally. Sepaton. Yes. Falconstor. Yes. So NetApp, welcome, at last, to the game.)"

My point of contention was simply that they AREN'T the last vendor to the game. They actually have actually been doing it for a while, just not in their VTL. And their delay is simply because they wanted to do it themselves vs what you EMC did, and I think that choice (while it caused a significant delay during one of the hottest technology trends in recent years) is admirable.

As to my relationship metaphors, I had a laugh typing them myself. I'm glad you can have a sense of humor about this. I forgot one, though:

...or coming to the party with TWO dates, BOTH of which are in a love triangle (EMC/Falconstor/Quantum). Sorry, it was just too good not to include. ;)

Scott Waterhouse

The 4000 has been shipping for 2 years now!

If you mean the 4000 with data deduplication, that has been shipping for 3 months at this point.

With respect to the 4000, we do have a great deal of unique code aside from any integration component with deduplication.

With respect to the 1500/3000 there is a small, but growing amount of unique IP there. Over time you will see increasingly greater differentiation between our offering and Quantum's, and that excludes features and functions which we, ah, encourage (!) Quantum to add.

Finally: if both of my dates are supermodels, I know I am pretty happy that I have two of them. ;)

The comments to this entry are closed.

Search The Backup Blog

  • Search