Disclaimer: everything that follows is my opinion. In no way should it be mistaken for any sort of official EMC position. Any resemblance to that is purely coincidental.
Amidst the chaos and confusion of the EMC offer to buy Data Domain, I think the biggest unanswered question has been: why would EMC want to have three or four or five different deduplication technologies?
Truthfully, I think the question has a profoundly simple answer: because backup sucks.
Mmmm, irony.
Don't believe me? Show of hands: how many people actually like their existing backup application?
Given the name of this blog, and that I have spent the last 15 years plus on backup and recovery, I think I can appreciate the irony as well as any.
Even more ironic when we remember that Data Domain's original slogan was: tape sucks.
And by the way, there is a back handed answer in there as to why it makes sense for Data Domain to be acquired by somebody. If you claim that tape sucks, and you try to "fix" this with another piece of hardware, all you are really doing is building a better tape drive. You are trying to be StorageTek, only better. Where better means faster and cheaper. But does a faster and cheaper tape drive make backup suck less? Maybe. But probably not nearly as much as it could--if you had a broader perspective and reach with your technology roadmap, one that includes CDP, primary storage, a backup application, and some virtualization capability. More on what I would do with all that later. But one final question: if your objective is to fix backup, completely, and you think that you need access to all those components to do that, who is going to be in a better position to do this? EMC? Or NetApp?
Having said that, the biggest obstacle to fixing backup is not technology. It is inertia. It is cultural. It is fear of change. It is ingrained process. It is the fact that we have done things one way for so long that the reason we are going things has been forgotten.
(Another aside: if you want to fix backup, and I mean really fix it, then the first thing you should ask is: what am I trying to achieve? When, where, and why do I need backup images of my data?)
An example. Many customers with defined practices say they need tape off-site. My belief is that a long time ago, the only way to get a copy of your backup data off site safely, securely, and reliably was to put it on tape. However, it is easier to say "I need a tape off site" than it is to say "I need a secure, safe, reliable image of my data off site." Unfortunately, the words became the practice, and it is frequently the case now that even though deduplication can safely, securely, reliably (and cheaply) get a image off site, it is not on tape, so it is not good enough.
My conclusion is this: as long as the primary barriers to fixing backup are NOT technological, customers will require data deduplication at multiple places. Primary storage. Backup source. Backup target. Replication. And some backup is still best done to disk/virtual tape without deduplication. There is no one size fits all.
And even if you remove the cultural and procedural barriers to change, you still need access to all those technologies to fix backup.
You still need primary storage deduplication. At EMC that is provided natively on the Celerra platform.
You still want source deduplication for (some) backup. This is Avamar. And despite the contentions of virtually all commentaries on the value of a target deduplication technology acquisition for EMC, there is a very significant set of use cases for source deduplication. It is a uniquely powerful and useful technology that will continue to have a role for the short, medium and long term. No target deduplication solution will ever be able to make the same powerful value statement that a source deduplication solution does so long as there is anything remotely resembling a traditional backup application in the mix. (I feel the need to qualify this in case somebody realizes just how good a thing EDM was--but that is also a different story!) Only source deduplication offers massive bandwidth savings, massive reductions in time to complete backup jobs, and the ability to increase the density of server consolidation. The more clouds you see, the more virtualization becomes the prevalent deployment model for servers, the more source deduplication makes sense.
You still want target deduplication. For those people that can't or won't change their backup application. And for those folks that don't meet the use case of source deduplication (their data set is too big, for example).
And you still want backup without deduplication (or with post-process deduplication). Again, despite the protestations of the few, the hard reality of the fact is that there is NO single, general purpose deduplication device that can scale to meet the needs of the enterprise. Nothing that can meet the needs of the very large backup job that must complete within a defined backup window (where the current standard of 1.5 TB/hr/dedup appliance is off by an order of magnitude or more). In EMC terms, this need is met by the DL4x06 line.
So why do we have four different deduplication technologies? Because we need them. And we need them because customers ask for them. And because there is no other good alternative right now.
As we go forward, and the existing processes, procedures, and technology in legacy backup becomes more obviously broken, all of these pieces will also be required.
The difference is that those vendors that have them all as part of their portfolio will have an extraordinarily powerful way of fixing backup. And more than just backup: data protection more generally (as well as primary storage). Bringing it all together. Unifying the process, procedure, software, and infrastructure in a way that can fundamentally fix things. A fundamental fix that no single point solution can provide.
What if I could radically simplify my software? What if I could deduplicate at the source or the target transparently? What if a single device could be the repository for CDP, source, and target replication? Lots of what ifs there, but at the root is the notion that having some level of ownership in each element is an advantage to the delivery of the final vision.
Of course it also acknowledges that data deduplication will be a very important core capability across storage infrastructure.
And finally, it acknowledges that each of the pieces will have a very important role going into the future.
Scott,
Generally I agree with much of your post. There is definitely a play for source and target de-duplication in this arena. No arguments there at all. I can appreciate that EMC is attempting to corner the market in this fashion. If you have the cash, then why not do that.
What I and perhaps several others are confused about is why acquire a Target-based DeDuplication company - Data Domain, when you already have that same technology through your relationship with Quantum.
Either Quantum just isn't buttering your bread on both sides (which I believe is true) and/or you simply feel that a couple of billion dollars is a small price to pay to keep this technology out of the hands of your competitors. In other words, buy it before someone else does.(I think this is also true).
You probably can't comment on that, which is fair enough. On the bright side, with your recent pay cuts at EMC you can probably afford anything. I say that tongue in cheek as I've had my pay cut already.
What I am surprised about to some extent is that I think your current offering (DL3D) currently lacks scale (148TB max capacity/ingest 1.5TBhr on a good day).. and so does Data Domain (you've pretty much said so yourself in earlier posts).. if you compare that to say Diligent (1PB max / ingest of 3.5TBhr) your stuff is really small fry. So, essentially you are purchasing a company with another box that doesn't scale that well. What I will say though is that the Data Domain box does work as advertised and from the customers I've talked to, appears pretty stable... and that's a good thing. I wouldn't be surprised that if this acquistion does go through that Quantum will just gradually fade into the distance.
Oh and no, I don't work for NetApp or Data Domain for that matter..or IBM. I'm JAFO ;-)
Posted by: NK | June 04, 2009 at 05:23 AM
Well, not too much I can comment on there. :)
About the only thing I will point out is that the DL4000 line with deduplication will scale to 972 TB (296 of which is in the dedup pool) and 8 TB/hr. Nevertheless the Diligent system scales well--I believe the weakness of their approach is that they require fast (FC) disk, and they can't replicate. They are offering XiV as an alternative to FC, but that has real issues (in my opinion) due to the susceptibility to double disk failure.
Posted by: Scott Waterhouse | June 04, 2009 at 08:32 AM
It may be a good thing that EMC is expanding its portfolio, however it confuses the customer even more with so many redundant solutions - and none of them are complete. I like Avamar, however I do not want to sit and sort through data trying to determine what will be deduplicated well and what will not - I want to use a single solution for the entire organization instead of having to purchase Avamar, Networker, and a DL3D solution and then spend days integrating the three, plus worrying about doubling on investment for all three for replication purposes.
Posted by: Sergei | June 04, 2009 at 08:35 AM
I hear you loud and clear Sergei. Honest I do. And there is nothing I would like more than to be able to deliver that single solution. Nothing. But it doesn't exist.
So, at the moment, you have two choices: 1) pick best of breed point solutions (which you described in your 3 solutions scenario); 2) pick one solution, which inevitably will entail compromises in some use cases.
I think that is a completely valid generalization for all vendor solutions at the moment. Where EMC differs is that we can offer you all those point solutions (no other vendor can) and that there is some integration between them (Avamar can be integrated into NetWorker, should you choose to do so.)
As a quick aside: Avamar will work (or not) largely on the basis of size and use case--it is very quick and easy to determine what to use it for.
Posted by: Scott Waterhouse | June 04, 2009 at 08:42 AM
"as long as the primary barriers to fixing backup are NOT technological, customers will require data deduplication at multiple places"
I can't agree more : the problem is so deep that we'd need to change the job description of the Backup Manager to a new one, if the Backup Manager would argue about too many choices on his well defined / window closed / do not touch it environment the new comer, the Restore Manager will definitely understand why more choices (and concurrent ones, maybe) is better.
Ah, I've almost forgot..Tape Does Suck, check here (a quiz for the sentimentalists, what's that ? Right Model please)
http://www.flickr.com/photos/37880054@N03/3598339508/
Ciao
M
Posted by: Maurizio | June 05, 2009 at 09:35 AM
"I want to use a single solution for the entire organization instead of having to purchase Avamar, Networker, and a DL3D solution and then spend days integrating the three"
with the emc portfolio you can use networker for traditional backup, with avamar as a source de-dupe node and edl as a target de-dupe node. cdp is also integrated with recoverpoint cdp.
so you can have traditional backup, source de-dupe, target de-dupe and cdp all managed via a single product.
pretty easy to deploy...
Posted by: hughman | June 25, 2009 at 07:03 AM