« Killing Me Softly | Main | Signing Off »

January 28, 2010


Feed You can follow this conversation by subscribing to the comment feed for this post.


While it is true you can see very high dedupe levels with variable block level deduplication I find that you're shifting some of the expense of the operation to another area. If you're just deduplicating a file server it's not such a big deal. But if you're trying to deduplicate Oracle with Avamar you end up having a more expensive operation occur on the database than just dumping the backups to a disk. I have to caveat that my sample size is low, so I'm completely open to your response and would be ecstatic to be proven wrong.

I still like Avamar, I just think that some of the work that you have to do in order to see the high level of dedupe may not help medium sized companies as much as it does Fortune 500 companies where servers sit around mostly idle all of the time and so pushing an Oracle box harder isn't such a big deal.

My opinions are my own and do not represent my company.

Scott Waterhouse


The situation you speak of may well be the case--and this is an ideal use case for target deduplication. Avamar may still be appropriate but there are a host of issues to consider.

As an interesting aside, most database backups with deduplication default to a fixed block deduplication of 8 kb, because that is how the size of a database field in most cases anyway. So it turns out to be more efficient to do this. On the other hand, we still achieve similar net deduplication ratios to the variable length dedup that I discussed above (in part due to how well databases compress, and assuming that we are talking about a database with an average change rate).


Hi Scott, good topic. The way I describe file/fixed/variable dedupe is that its a tradeoff between space savings and performance impact. File dedupe (SIS) will get you some small savings but its very easy to do. Variable block will get you the most savings but it takes the longest to do. Fixed Block-level dedupe is somewhere in the middle.

One is not better than the other, the choice depends on the User's tolerance for performance overhead vs the desired space savings.



The comments to this entry are closed.

Search The Backup Blog

  • Search