One of the most powerful features of the EMC DL3D is its ability to do both in-line and post-process deduplication. It is one of the few (the only?) deduplication appliance that offers users this flexibility, and in fact lets you do both on one appliance, depending on the requirements of the data being backed up.
Certain competitors have done their best to portray this flexibility as a weakness. Huh? Since when is choice a bad thing?
Well, it isn't a bad thing at all. Choice is good. Flexibility is good.
The other criticism that gets leveled against the DL3D here is that the in-line is not really in-line. In particular, Data Domain attempts to claim that EMC doesn't do in-line deduplication at all--that we write to a disk cache, and that this lets us be deceptive with our performance numbers.
Nothing could be more misleading.
So I wanted to take on the issue of why we took the design approach we did. Because if we understand what is going on underneath the covers, we can get a glimpse of just how powerful the DL3D can be, and how much better the device can meet the practical day to day needs of backup and restore administrators.
So what is really going on when you write the data to a DL3D?
Simple, there are two basic modes of operation.
The first is in-line or immediate deduplication. Immediate deduplication waits for 250 MB of data (or about 3-4 seconds at Gigabit Ethernet speeds) before deduplication begins. Data is written in its "native" format, and deduplicated in-line. When this mode is enabled, we can sustain the performance metrics I described here. No smoke and mirrors. Immediate deduplication can sustain speeds faster than a Data Domain DD690. We don't use cache to beat their benchmark. We don't need to.
The second approach is delayed deduplication. We can also call it deferred, or out-of-band. No matter what you call it, the approach is simple: all backup data is written in its native format, and is stored that way until such time as your DL3D policy determines that it is time to deduplicate it. Why do we do this? Again, pretty simple: because a native write can happen a lot faster. We can get the CPU out of the data path, and deduplication is no longer bottlenecking throughput. Typically Data Domain doesn't have a lot to say about this mode because, well, they don't offer it. However, for reference, you can read here and see some of the strengths and weaknesses of this approach relative to in-line deduplication.
Now here is the kicker: regardless of which approach you take, the DL3D will have data written in native format. And it will not immediately delete this data, even after it has been deduplicated. This process, called "truncation" will happen on a scheduled basis. And there is a very good reason for keeping the original, native format data around: fast restores.
Again, Data Domain misses the point, because they don't ever have native data on disk. So when it comes time to restore data from a DD box, you are always restoring deduplicated data. Which is relatively slow when compared to the backup speed. Since Data Domain doesn't publish restore speeds, I will have to rely on what customers tell me: data reads from a DD system are typically 25-33% of the speed of data writes. If you have a DD690, this means that you can't expect to be able to restore data faster than about 100-120 MB/s.
But this is where they get it wrong when it comes to the DL3D. In their haste to assume that the DL3D is all bark and no bite, they have assumed that whenever EMC discusses restore performance, we are discussing restore from native data to make us look better than we actually are. This is, of course, untrue.
The restore performance of a DL3D has two key metrics. First is the restore of data that has been truncated. Data that is being restored by reconstructing its meta-data. Which means that you are restoring from data that has been deduplicated. In this case, the DL3D performs very similarly to a Data Domain box. This is our worst case: the slowest restore speeds on a DL3D come from deduplicated data, and a DL3D 3000 will be every bit as fast as a DD690.
The second key metric is based on the restore of native data. If the native data has not been truncated (and I will go into the circumstances in which that is the case in a follow-up post) then the restore of the data is nearly as fast as the backup. In other words, there is virtually no penalty at all to restoring this data. Put another way: it is 4 or more times faster than any restore from a DD690.
And now we see why we "cache" data even during immediate deduplication: because by doing so, we have the ability to do restores 4 times faster than otherwise. And 4 times faster than the pure in-line deduplication only vendors that do not keep a native copy of the data at all.