Well, I had hoped that after the last exchange of information, the Data Domain folks were going to learn to be happy with their smaller, slower, and less flexible deduplication systems. Heck, denial isn't good for anybody. And as the expression goes: "give me the strength to accept the things I can't change." Unfortunately Brian Biles is so far up the proverbial Egyptian river I bet he has an excellent view of Valley of Kings from his location.
Brian writes in his blog Dedupe Matters : "The objective of "Dedupe Matters" is to provide a clear view into the realities of one of the most talked about and rapidly adopted technologies in the storage market today." Since we are talking about backup, I trust that he is not referencing enterprise flash drives for the DMX! No. Of course, he is talking about deduplication, and since he is asking for clarity and reality, I am only too happy to oblige.
Acceptance and strength are up to you though, Brian!
So what are Brian's claims? Well, lets deal with them in order:
First claim: "EMC is repackaging the Quantum 7500".
Not true. EMC has no less than 3 (or 5) different models of target deduplication appliances in our product line: the DL3D 1500, 3000, and 4000. The 4000 is actually a family however, of which there are 3 models: the 3D 4106, 4206, and 4406. So is that 3 or 5 in total? The marketing department counts 5, but I will let you make your own decision! In 2 of those models we use some pieces of the Quantum software stack. In the 4000 series we merge this with our long term EDL (virtual tape) offering. In all cases we use EMC storage, specifically the Clariion. The same Clariion that offers over 99.999% availability. As observed in the real world. And the entire package in integrated by EMC (including the servers) and supported exclusively by EMC. Hard to call that "repackaging" I would say...
And by the way, why do I make such a big deal about the whole Clariion thing? Because we are the only target deduplication vendor that builds an appliance on tier 1 midrange disk. The Clariion platform is a tremendous differentiator in terms of features, reliability, availability, support. It is a massive competitive advantage.
Second claim: "The dedupe rate is still less than 150 MB/sec".
Not true either. I will soon post specific discussion of the performance metrics of our 3 (or is that 5?) platforms, and the performance you can expect under the different operating conditions we support: in-line and post-process deduplication. For now, I can abbreviate the discussion by saying the performance number provided by Brian for in-line deduplication is completely incorrect. When I claimed that EMC offers the fastest general purpose deduplication appliance, I meant it.
Third claim: "The effective capacity, the amount you can dedup in a backup day, is something less than 13 TBs... so the only way you could use 150 TB of this disk ... is if your backup window is more than two weeks long."
OK. Wait... What? This makes no sense at all. If somebody from Data Domain could clarify this, I would be more than happy to address it. Because I just can't logically parse the notion that a deduplication appliance should offer no more total capacity than can be processed (ingested) in a single day. Don't deduplication appliances offer the capability to retain backup data for long periods of time? Longer than disk without deduplication?
Even at great deduplication ratios however, this still requires capacity. Therefore, there is a direct correlation between the capacity of the appliance and the length of time for which you can retain data. More capacity equals longer retentions.
Lets deal with this in terms of an example. Suppose I have a 24 hour backup window, and suppose I can ingest data at 400 MB/s, or about 1.44 TB/hr. Suppose I have 35 TB of data to back up, and it doesn't compress at all. It would take all day to backup to my appliance, but I don't care, because management has generously allowed me a 24 hour window. Or maybe they are even more generous, and let me use BCVs to back up from, so I don't impact the performance of my production disk at all. Now further suppose I have a 5% daily change rate on the data. And finally suppose that I am using a Data Domain 690. How many days can I retain? About 3. That is right, 3. For a net deduplication ratio of about 2.8:1. Whoopee. Why so little? Because I just ran out of space. The 690 only holds about 36 TB of useable capacity. If you were to use a DL3D 3000 however, you could hold 64 days--it offers 148 TB of useable capacity. For a net deduplication ratio of about 15:1.
So the notion that the total storage capacity of a deduplication appliance should be limited to its 24 hour ingest capacity is bizarre. A red herring, at best. At best it is equivalent to saying "scalability is bad". At worst, it is a notion that Data Domain is throwing out there to confuse the marketplace. And worse yet (because it has been repeated by Beth White at Data Domain as well) I suspect that it is deliberate.
So lets clear the air. Scalability matters. Scalability is good. And no amount of Byzantine logic or reasoning so badly tortured and twisted that you would think the Spanish Inquisition had got hold of it changes that. And yes, at the end of the day, the DL3D 3000 scales to 148 TB useable, and the DD690 to about 36 TB useable. That would be about 4:1 in favor of EMC if you are keeping track, Brian.
Fourth claim: "Data Domain is still the performance leader."
Untrue. I think we can see by this point that this is an unjustified claim in the purest sense. There is no data whatsoever to justify this assertion (an interview from 6 months ago that references a hardware platform completely different from that used by EMC and software that isn't the shipping version is not useful at all here).
I guess at the end of the day, we can all claim whatever we want. It is up to you, the readers, to weigh the evidence, and make up your own minds. But I know it makes me a little crazy to make an assertion like this without any credible basis, and offer it as fact. Lets call this what it is: a little chest thumping cooked up by the marketing folks over at Data Domain. The real problem is, they are wrong. Personally my policy is: if I don't know for sure, I won't make the claim! I can only suppose they differ in opinion on this too...
Fifth and final claim: "EMC is still selling massive ... disk systems."
True! And yes some text was deliberately omitted. Because getting five out of five wrong is just embarrassing. So for Data Domain's sake we will just make the change and give them one. (I would also point out that we are also selling standard virtual tape. And backup software. And archive software. And source deduplication. And a dedicated archiving platform. And offer backup consulting and residency services. None of which Data Domain can offer.)
Data Domain's final score: 20%
Not exactly a passing grade, is it?