I saw this little quote on Techtarget today: "File-level deduplication will save a relatively small amount of space on your disk/tape archive. Block-level deduplication will save more space on your disk/tape archive, and variable block-level deduplication will save even more space on your disk/tape archive." For those of you interested in reading the whole article, it can be found here (registration required).
While the statement is true, it is a little light on detail, and downplays the importance and impact of the different technology choices. By way of setting expectation, we could expect the following deduplication ratios when backing up the same data set with the same retention ratios:
- File level deduplication: 3:1 to 5:1
- Fixed block level deduplication: 5:1 to 10:1
- Variable block level deduplication: 30:1 or better
So there is a pretty substantial difference here. And while capacity savings should not be the be all and end all of a technology choice around deduplication, certainly differences of this magnitude will come into play. Bear in mind that as the dedup ratios increase, the incremental capacity savings decrease. Given a 1 PB backup data set:
- File level deduplication would require 200 to 330 TB of disk.
- Fixed block deduplication would require 100 to 200 TB of disk.
- Variable block deduplication would require 30 TB of disk or less.
This is a significant differentiator. Fixed block schemes are much less efficient, and you should absolutely take care to understand if your vendor offers file length, fixed block, or variable block.
As an endnote, I would say that based on my personal observations, these differences are very real. I have encountered several situations in the last few months where Avamar has demonstrated much high deduplication savings than products which only use fixed block deduplication. There is a good reason why EMC has implemented variable block deduplication in both our deduplication portfolio.