I have had a lot of comments, mostly constructive, on some of posts about performance and the EDL 4000 3D. (The translation of that product name, by the way, is the EMC Disk Library 4000 with deduplication.) W. Curtis Preston has been one of the most vocal critics, and while I think he is dead wrong--nothing personal, Curtis, that is just my opinion of your opinion--I don't think his comments are so trivial that I am going to dismiss them out of hand. I also suspect that they are roughly congruent with the initial reaction some users might have to the device.
So I wanted to take the opportunity to address some of these comments in a more structured fashion than just the back-and-forth in the comments section. Giving Mr. Preston his due, I will quote him directly and then respond to the objections.
First things first: for those unfamiliar with its' basic architecture, the DL4000 3D consists of an EDL 4106, 4206, or 4406 with a deduplication engine (or engines, in the case of the 4206 and 4406). The 4x06 family is EMC's very successful virtual tape solution. It is a simple to manage, fibre channel attach appliance that offers very high performance--up to 2,200 MB/s on the 4406--emulates a very wide variety of tape drives and libraries, offers a very robust set of virtual tape library functions, and can expand to very large capacities--up to 670 TB of useable virtual tape capacity (excluding compression).
A DL4000 3D simply adds a deduplication engine, and a dedicated pool of deduplicated storage to the EDL 4x06 platform. The deduplication engine is "invisible" to backup applications. From a user and application perspective, data is always stored on a virtual tape library associated with the EDL 4x06. It may or may not be deduplicated, that is a matter of policy. But from an application perspective, all data resides in one location. From a software perspective, there is a very small additional piece of code, incremental to a standard EDL 4x06, that is a policy manager for deduplication. Basically it asks, for each virtual library (technically, for each cartridge) two questions: when do you want to deduplicate it? and how long do you want to leave a fully hydrated image on the standard pool of VTL storage?
That is the basic approach, and Mr. Preston hates it. That is his prerogative, but it is my opinion that his dislike is founded largely on misconception. So lets deal with some of those misconceptions, shall we?
W. Curtis Preston: "The DL4000 can ingest at 2200 MB/s, but ... the DL3D engine cannot keep up with an ingest rate that fast. And restore performance will not be maintained unless you keep everything in the cache."
There are actually two important questions here, and the first hinges on a very important concept: Service Levels. Or the other side of the same coin: Recovery Time Objective (RTO). Let me put that somewhat more crassly: do you care about your backup window? If you have a backup window that you care about, and want to stick to, then you care about ingest speed. And the DL4406 (with or without deduplication) gives you the ability to ingest over 8 TB per hour. That is approximately 6 times faster than a DL3000 can deduplicate data (in-line). So if you have an 8 hour window, you can back up as much as 64 TB to a DL4406 in that period of time. That is well beyond the capacity of any deduplication box currently available. From anybody.
So if you care about making your window, the best way to do that if you have a lot of data is with a DL4406.
The second question is how fast can you deduplicate? So we can deduplicate at approximately 1.5 TB an hour per deduplication engine. And you can have one deduplication engine per DL Engine. (A DL4106 has one, a DL4206 or DL4406 has two DL Engines.) So since we are talking big numbers, we can deduplicate about 48 TB a day on a DL4406 3D.
Not quite as much as you can stuff into a DL4406 in that 8 hour window, but still not bad. The point is that you have the rest of the day to deduplicate, if you want. What you care about is making that window. Once you have done that, you can deduplicate at your leisure, or until the window rolls around again on the next day.
And here we have a pretty clear justification for the DL4000 3D: if you have a lot of data, and you care about your backup window, you have a reason to look at the DL4000 3D instead of either our DL1500 or DL3000 systems. If finishing your backup is your priority, the DL4000 3D is an excellent choice.
As far as the restore performance component of the quote above, the choice is yours: by policy it is up to you how long to retain a backup in the VTL (non-deduplicated) storage pool. Restores from this pool can be accomplished at up to 1,600 MB/s. Far faster than pretty much any other solution available today, from anybody. At 6 TB an hour, that is certainly much faster than any deduplication solution.
Which turns out to be the appropriate compliment to fast backup: fast restore. So the DL4000 is the right choice if you care about your restore speeds, as well as your backup speeds (and you have a lot of data).
As to how much remains in cache: that is up to you. More accurately: it is up to your SLAs. If you have no SLA around restore, if you don't care how long it takes to restore something, than it will be tough to justify a DL4000 3D. If you do care however, then there is no better choice than a DL4000.
W. Curtis Preston: "The Falconstor [DL4000] piece adds no value that additional storage on the Quantum part of it [the deduplication engine] wouldn't add."
This is just categorically untrue. In addition to the performance issues just discussed, there using a DL4000 offers several additional functions over a DL1500 or DL3000. Now not all of these will appeal to everybody. That is OK. But they do appeal to enough folks that we think it is important to offer them and support them. Kind of like an iSeries server: they don't appeal to everybody, but those that want them really want them. So what can a DL4000 do that a DL1500 or 3000 can't? Well the following list isn't comprehensive, but includes:
- ACSLS compatibility. Simple: the DL4000 is, the DL1500/3000 isn't. If this matters, it matters.
- Tape Caching. And I am talking physical tape here. If you want to cache your virtual tapes to physical tapes, and you want to do it in a way that is transparent to the backup application, then you need a DL4000, not a 1500/3000.
- Embedded Media Server/Storage Node. Do you care about creating replicas, copies, or physical tapes in a way that is managed and consistent with the catalog of the backup application? If so, this matters. Some of this functionality may now be available via OST on the DL3D, and Path To Tape support and replica support for NetWorker, but neither of those choices is as mature or full functioned as the option offered by running a Media Server or Storage Node right on the DL4000.
- Wide number of tape and library emulations. If you need to emulate older or unusual tape formats (anything other than a LTO, really), or if you need to emulate an odd library type, than the choices offered by a DL4000 matter. This may be a relatively small set of users, but again: if it matters, it matters.
- iSeries connectivity. Not available on a 1500/3000. Available on a DL4000. 'Nuff said.
W. Curtis Preston: "The 4200 & [4400 sic] may offer a faster ingest rate, but it will still be gated by the ingest rate of the 3DL/Quantum box on the back end, which is approximately that of the 4100."
Well, this is an argument based on a false premise: the ingest rate of the DL3D back end is approximately 1.5 TB/hr. The ingest rate (per DL Engine) of the DL4106 is 4 TB/hr. Now I am sure Mr. Preston is getting giddy with excitement (just kidding Curtis, I know you were in the Navy, and sailors a) don't get giddy; b) can kick my butt!), because he is imagining that his argument just got stronger. However, I think we have adequately discussed backup windows, and SLAs, to dismiss this argument by now. False premise or no, it is wrong.
If you care about these things, the DL4000 3D offers you unmatched flexibility and performance to meet your windows and stick to your SLAs. If you dont (or your backup data set is just not that big), then buy a DL1500 or DL3000. This is a choice, and it is a legitimate choice. And there are perfectly comprehensible, perfectly justifiable reasons for why you might make either choice. But I don't see offering a range of systems that let you make that choice, and pick a technology that is appropriate to that choice, being a bad thing. Just like offering a choice of in-line deduplication and post-process deduplication: these choices make for a more flexible, and ultimately more useful device.
Saying that more flexibility and more choice are bad (as long as they don't come at some huge expense in terms of ease of use or manageability) is just silly. Note the caveat, and note this Mr. Preston: these choices are trivially easy to make, and trivially easy to implement, on our DL product line. That is true of the DL1500 and DL3000, and it is equally true of the DL4000 3D.
W. Curtis Preston: "The Falconstor piece on the front of the Quantum piece adds less value than simply purchasing a second Quantum-based box. If the Quantum-based 3DL engine can do 4 TB/hr, and a customer needs 8 TB/hr, the EMC answer would be to upgrade from the 4100 to the 4200 or [4400 sic]." And further: "It's GOT to cost more than just buying a deduped box from any vendor that isn't front-ending their box with another vendor's box."
Again, we need to get the facts straight: a DL3000 can do roughly 1.5 TB/hr. So if a customer needs 8 TB/hr (which, let's be honest, is a lot) the EMC answer is to have a good hard look at a DL4406 with deduplication. So really the option is not purchasing another DL3000, but another 5 DL3000s, for a total of 6. Because it takes 5 or 6 to equal the ingest performance of a single DL4406.
So for an accurate cost comparison, we would have to look at 6 DL3000s vs. one DL4406 with 1 or 2 deduplication engines. And given the performance of other vendors' deduplication boxes, the choice is usually 5 or 6 of their appliances to. Or one DL4406. Now the proposition seems a little more sensible, doesn't it?
W. Curtis Preston: "Again, I don't care how you put a product together. I care how much it costs and how easy it is use (performance, manageability, etc) vs competing solutions and the EMC 3DL 4000 loses out on both counts."
At last, something Curtis, I, and the rest of EMC (if I can presume to put words in their mouth) can agree upon. There is nothing more important than balancing cost with performance, ease of use, and manageability.
With respect to price, all I will offer up is: call your sales representative if you are still with me and are interested in doing a comparison. The price will be whatever it is, I am not terribly interested in debating if it is x or x + 2 dollars. We think we are competitive. Customers seem to agree based on sales.
More importantly, I think that the DL4000 platforms offers very powerful management features. These features are accessed via the extremely easy to use DL Console management interface. The DL Console has been deployed on well over 5,000 systems. I have used it myself. And if I can create and deploy virtual resources to a backup server in 5 minutes, well, anybody can! From deployment, to management, to reporting, to policy creation, the DL 4000 is extremely easy to use. In terms of the deduplication features, here is all you need to take advantage of them:
That's it. One screen. Pretty painless.
And just as importantly, it is flexible. When you deduplicate, how much performance you get, how long you retain data on the VTL, how you leverage replication and physical tape are all choices that can be made that make real differences in operational deployment and use.
So we have a performance advantage, coupled with easy of use and flexibility. And costs are, at very least, competitive.
This sounds like a good package to me.