Based on the previous post, DL3D Deduplication, we have seen that there are 3 possible deduplication methods for the DL3D to adopt: on, which is the equivalent of in-line; schedule, which deduplicates data after it has been written by the backup server; and off, which never deduplicates data.
It will probably come as no surprise then, that there are three different performance metrics that we will be interested in. Basically, the question becomes, under each of the 3 deduplication schemes, how fast is the DL3D?
However, there is an extra layer to this. Because the DL3D offers both IP and FC connectivity (for NAS connectivity via NFS/CIFS, and VTL emulation respectively), and each of those has different performance attributes, we can expect to see further performance differentiation based on how you connect to the device.
Further, for clarity, I will be focusing on two of the three family members that EMC offers in the DL3D product line: the DL3D 1500 and DL3D 3000. The DL3D 4000 versions are worthy of an entire conversation themselves. We can do that at some other point!
Also, lets level set things by noting that the DL3D 1500 has 6 Gigabit Ethernet connections, and 2 4 Gb FC connections. The DL3D 3000 has 8 Gigabit Ethernet connections, and 4 4 Gb FC connections. (The Ethernet ports are bonded by default, too, in both models.)
With all that said, we can represent the performance metrics as follows for the DL3D 1500:
|On||.76 TB/hr||.78 TB/hr|
|Scheduled||.84 TB/hr||1.34 TB/hr|
|Off||.84 TB/hr||1.8 TB/hr|
The performance metrics for the DL3D 3000 are:
|On||1.44 TB/hr||1.6 TB/hr|
|Scheduled||1.65 TB/hr||2.5 TB/hr|
|Off||1.65 TB/hr||4.0 TB/hr|
A couple of interesting things are going on here:
- VTL performance, via FC, is significantly faster than NAS performance, via IP. The difference is less remarkable on the DL3D 1500, which is a slower system, but on the DL3D 1500 the difference is quite pronounced.
- One can deduce the CPU overhead of deduplication fairly quickly here by comparing the "On" deduplication performance with the "Off" performance for the DL3D 3000 with FC connectivity.
Also, lets make some of the inevitable comparisons to Data Domain. I know that almost everybody will do this anyway, and I also think that I am more or less obliged to do so at this point since I was the one that made the claim of "fastest"!
First, I will note that the DL3D (EMC's current entry system) is "almost" as fast as the DD690 on scheduled deduplication. The DD690 is rated by Data Domain at 1.44 TB/hr, and the DL3D 1500 is capable of 1.34 TB/hr with FC connectivity. I would also note the scaleability of the 1500 is almost identical to the DD690 too: the DD690 scales to 35.5 TB useable, the DL3D 1500 to 36 TB.
The DL3D 3000 is unquestionable faster than the DD690. When connected via IP they are on par, but as soon as you take into consideration FC connectivity, the EMC device performs better. When you schedule deduplication, the performance gap is significant: the DL3D performs nearly twice as fast as the DD690, offering 2.5 TB/hr of deduplication performance. I would further note that there is considerably more scaleability in the DL3D 3000: at 148 TB, it is offers more than 4 times the capacity of a DD690.
(And no, I don't consider the Data Domain DDX worthy of consideration in the performance game: as mentioned before it is an interesting marketing contrivance, nothing more. It has no shared anything, no common data pool, no shared physical components, no failover, etc. It is simply 16 DD690s with a single part number slapped on top by the marketing folks. My previous remark still applies: if that is a legitimate way of describing performance and scaleability, then let me introduce the EMC DL3D 5000: 16 DL3D 3000s, total deduplication performance of 40 TB/hr, total useable capacity of 2,368 TB--which is about 47+ PB at 20:1 deduplication! No, such a model doesn't actually exist--well no more than the DDX does, really.)
So, stripped of ambiguity, there are the performance metrics. I think there are at least two or three other important components to the overall conversation here however: deduplication scheduling and replication, restore performance (it is just the only reason that we are doing backup in the first place!), and the existence of native data on the DL3D.
Naturally, you can expect those to be the subject of my next two or three posts.
See you then.