The always interesting and informative Preston de Guise has a discussion of source and target deduplication on his blog "The NetWorker Blog". Which is just about as imaginatively named as this blog. Well, he may not be a marketing mad man, but Preston is one of the most informative and most informed people I know writing about backup and recovery.
However, he does raise a couple of concerns about source deduplication that I would like to address. Now source deduplication for EMC means Avamar. Actually, for most everybody it means Avamar, because there really is no other credible source deduplication alternative today. There may be software deduplication solutions--but most of them tend to be target solutions in one way or the other. And not terribly technically or architecturally credible. (Yes, I am thinking of you, TSM.) With vanishingly small market share to boot.
So Avamar it is.
Preston mentions a few points about Avamar:
In regular backups while there may be some benefit to reducing the amount of data transmitted, what you’re often not told is that this reduction comes at a cost – that being increased processor and/or memory load on the clients. Source based deduplication naturally has to shift some of the processing load back across to the client – otherwise the data will be transmitted and thrown away.
This is true, but there is a very significant counter-point. That is: although source deduplication has a higher processing load on the client than traditional backup, it is for a much shorter period of time. More succinctly: your backups use more CPU, but don't take as long.
Experience, annecdotal reports, and testing we have done at EMC suggest that, on average, Avamar backups may consume 25-50% more CPU than a traditional backup. However, the duration of the backup may be 75% less with Avamar than with other backup products. (Even those, like TSM, which utilize a "progressive incremental" approach.) In some cases, Avamar may complete a backup 90% faster than traditional backup. Particularly in large, dense file systems and with filers you will see significant improvements in the time to complete a backup job.
Secondly, Preston writes that:
Onto the second proposed advantage of source based deduplication – faster WAN based backups. Undoubtedly, this is true, since we don’t have to ship anywhere near as much data across the network. However, consider that we backup in order to recover. You may be able to reduce the amount of data you send across the WAN to backup, but unless you plan very carefully you may put yourself into a situation where recoveries aren’t all that useful. That is – you need to be careful to avoid trickle based recoveries. This often means that it’s necessary to put a source based deduplication node in each WAN connected site, with those nodes replicating to a central location. What’s the problem with this? Well, none from a recovery perspective – but it can considerably blow out the cost. Again, informed decisions are very important to counter-balance source based deduplication hyperbole.
Again, Preston is just right. However, there is an alternative with Avamar that can be employed to mitigate this concern: Avamar Virtual Edition. Briefly Avamar Virtual Edition (or AVE) is an Avamar server in a virtual machine. So rather than having to deploy an entire physical node in each WAN connected site, you can simply deploy an AVE. An AVE is typically more modest in size than a physical Avamar node, and hosts anywhere from a very small amount of storage up to 2 TB--normally the capacity is just that required for local recoveries at the remote site. Storage can reside on the same storage system that your ESX server holds its data--it does not need to be dedicated.
So again, the issue Preston raises is true, but there is a practical and easy way to mitigate the financial impact. In practice, both source and target deduplication solutions have small footprint products that are reasonably price and appropriate for placing at a small, remote site.
Finally, Preston says that:
Now let’s look at a couple of other factors of source based deduplication that aren’t always discussed:
- Depending on the product you choose, you may get less OS and database support than you’re getting from your current backup product.
- The backup processes and clients will change. Sometimes quite considerably, depending on whether your vendor supports integration of deduplication backup with your current backup environment, or whether you need to change the product entirely.
The current support matrix for Avamar can be found on PowerLink (for EMC customers and partners). Without listing it in its entirety here, suffice it to say that Avamar supports a full range of databases, applications, host OS types and versions. Although it may have been true several years ago that support was somewhat more constrained than a typical backup application, that tends to no longer be the case. With the exception of one or two idiosyncrasies in the support matrix, Avamar supports as wide a range of applications and data types as any other backup application.
As for the backup processes changing? Well yes! They do. Thankfully. I will be the first to admit that switching backup applications can be a non-trivial process. Not something that should be lightly undertaken, or that most people will do without a second thought. However, here is the other side of the coin: lots of current backup environments are broken. If not irretrievably so, then very seriously.
VMware backup. Filer backup. Remote backup. Backup of dense file systems. All of these are things which many traditional backup applications simply do a very poor job of. And if your backup application is broken, then changing it may be a very good thing. I would encourage you to weight the costs and the benefits of switching applications very judiciously. Don't just do it because somebody told you it would be a good idea. (Even if that somebody is me!)
But... be honest about the strengths and weakness of your current approach. If the way that NetBackup or TSM does something right now is just really messed up, if it takes way too long to finish a backup, if your failure rate is way too high, if you just can't backup that ESX server in any reasonable time frame, then maybe changing backup applications isn't such a bad idea.