« Backup and Recovery Webinars | Main | More Webinars »

September 24, 2008


Feed You can follow this conversation by subscribing to the comment feed for this post.


I would add to your blog that because of the lack of generalization in the backup retention area, and other key factors in the de-duplication of data, it falls upon us the vendors (sw or hw) to explain clearly those factors to our clients and guide them through white papers or best practices guidelines to change their behavior so they can take full advantage and evaluate the viability of the new paradigm that is de-duplication in their environment. It is a bit of a chasm between the traditional and the new afterall. Just because a vendor touts de-duplication doesn't make their product(s) a panacea. You need a product that can offer de-duplication as a dolid feature amongst other solid features to give our clients the flexibility to optimize their backup and recovery environments.

That's my $.02 worth :-)


Bill Andrews - ExaGrid

De-duplication ratios have everything to do with the type of data and the number of copies or retention period. If the data is Microsoft Office, Data Base and email and you keep longer retention such as 18 weeks you can hit upwards to 50 to 1. However, if the data is pre-compressed data then the ratio will be poor or if the retention period is 4 weeks the ratio will be less. ExaGrid has hundreds of installations of disk-based backup with de-duplication behind existing backup servers and we see two things. The first is that about 2% of the data changes from backup to backup. So once you have the first copy each subsequent copy only take 2% more space. Across our customer base we see ratios of 10 to 1 all the way to 50 to 1 depending on the tpye of data the length of retention (number of nights and weeks kept). Therefore, the ratio can range greatly because the variations of data types and retention periods are endless. Hope this helps.

W. Curtis Preston

I've read your post and completely agree that you cannot know dedupe ratio until you know some things about the environment. IMHO, the big ones are frequency of full backups and retention period. Compression ratio isn't usually an issue, but I know that it can be with some data types.

I don't see why you think that SEPATON's guarantee is "ridiculous," or why you feel the need to put "guarantee" in quotes. If you read the fine print of the guarantee (available at http://tinyurl.com/3kyehz), you would know that it addresses the things you listed as requirements.

It is only for "NetBackup v5.1, v6.0, and v6.5 with Microsoft Exchange 2003 Agent (Windows 2003)
data," using "full backups of Microsoft Exchange 2003 data five times per week." and "thirty days" of retention.

So they address everything you said except for compression ratio, which I'm sure they just used a conservative number on. (Exchange compresses quite well in comparison to other data types.)

Isn't it possible that they've done enough deduped backups of customer's Exchange data to know what their dedupe minimum is for a given set of conditions, and offer that as a guarantee, given those conditions?

Wouldn't it be quite stupid of them to be making this up, given that the guarantee says that they'll give them $50,000 of free disk if it's not met?

Joe Walsh

I think you're forgetting about the impact of the data growth rate on the de-dupe ratio. By growth I mean the introduction of new, unique data to the dataset. If the growth of data is low (i.e. system backups) then the de-dupe ratio will be very high from one full backup to the next and continue to increase. If the growth rate is high (i.e. unstructured data) then there will be too much new data introduced at each backup event and the de-dupe ratio will level off very quickly.

Scott Waterhouse

Excellent point. You are absolutely correct.

Scott Waterhouse

With respect to the "ridiculous" guarantee Curtis, it is ridiculous because it makes an assumption and then only details it in the fine print.

But the assumption is that you are doing full backups every day (well, 5 days a week). That is exactly the assumption that is so misplaced and that I was trying to highlight in my post on deception.

In other respects, I know you are right: Exchange does compress well, and there is very little risk as a result. (Unless of course the customer has very small inboxes, which will lead to very high rates of change, and therefore very low dedup ratios).

Bottom line: the claim of any particular ratio bugs me. Especially when the assumptions are in the fine print and not stated up front.

The comments to this entry are closed.

Search The Backup Blog

  • Search