There are a lot of different deduplication choices these days. And lots of ways to shove these technologies into places they weren't meant to go (or shouldn't have been). Even if we focus just on deduplication for backup--and this is the backup blog!--then there are still a couple big choices to make.
The biggest choice, of course, is do you want to deduplicate at the source or the target?
In EMC product terms, do you want to use Avamar or DL3D? Just for fun, lets note that there are no real significant alternatives in the source deduplication space these days (as it is my perception that Symantec's PureDisk is retreating to a target based model). On the target side, Data Domain retains a strong position but continues to lose significant market share to EMC.
I think it is a common assumption that these choices can be decided on the basis of lots of factors: manageability, cost, physical footprint, expandability, and so on. You can make you own list. But I do get the sense that it is not often that the efficiency of the deduplication method is considered.
But what if one method is more effective than others?
That might make a pretty big difference, I think. Especially if your cost calculations are based on $/TB, not just outright expense.
Now I have been the first to say that after a point, deduplication ratios don't matter a lot. And I still think that is true.
Nevertheless, as the following table illustrates, there are some interesting differences here.
A quick note on assumptions. My assumptions are a full backup everyday, with a 3 month retention (actually, full or incremental, you get pretty much the same result on the target side). File data compresses at 1.5:1. Database data compresses at 3:1. Target deduplication sees a 2% daily change rate for file data, source deduplication reduces that to .5%. Both source and target see a 5% daily change rate for database data.
So two interesting things, I think:
- For unstructured data, source deduplication is significantly more efficient. If you have access to the data in it's native format (before it has been wrapped up and packaged by a backup application) that is very beneficial to the deduplication process. The result? Less storage.
- For structured data, we are seeing real differences between Data Domain's results and EMC's. And in fact, we are seeing a significant efficiency advantage on the EMC DL3D over the Data Domain systems in these circumstances. In this particular case, the difference would be almost 2:1 in EMC's favor.
As noted, I often say that deduplication ratios should not be the deciding factor (or even a factor?) in the decision making process. As it turns out, maybe I am just being too charitable to the competition. So go ahead, make it a factor. Or not. Either way I am very confident in the architecture, approach, and total cost of ownership advantages EMC solutions can bring to your backup environment.