Some time ago I wrote a post Peace, Love and Revolution about the issues that replicating deduplication appliances causes with backup applications. Some six months later, I thought it would be good to update the state of the nation and see what progress has been made.
To quickly refresh, the issue is this: how do you restore data from a deduplication appliance that is the target for replication--when it is the appliances that is managing the replication process and not the backup application. (And to be fair, this is not an issue unique to deduplication: the exact same problems apply if you are just using virtual tape. The issue just happens to be more pressing with deduplication because if it is "just" virtual tape, I can use the backup application to manage replication and not cost myself any bandwidth; with deduplication, if I use the backup application I miss out on the 95% bandwidth savings that deduplication replication promises.)
I would have loved to be able to title this post "Tears of Joy" because things improved so much. Or "Little Rock Star" if I could single out even one vendor for having made any significant improvement. But I have no such luck. Lucinda Williams is not quoted in the title because, well, the sum total of progress made in the last six months is... basically zero.
At which point I expect the Symantec folks to chime in: "but we do this!!" So lets get that out of the way: they don't. The problem is every bit as acute with NetBackup as it is with any other backup application. OST is a nice trick, but it is not broadly useful. (It does not work with virtual tape, only with disk backups. That makes it pretty uninteresting to a lot of larger customers that prefer to work with virtual tapes. And even with disk, it still has a few other key limitations. And I still can't past an abiding suspicion that the real agenda here is more control for Symantec, not providing needed functionality to users. If you look at what the limitations are, and why they don't support virtual tape, it is tough to see the reasons except that they want to keep as much stuff under Symantec control as possible, and leave as little room for value or function as possible for other vendors, including hardware vendors, in the deduplication space. Just my $0.02!)
So, lets look at what is required to do replication between two sites, and coordinate that with the backup application, and potentially produce a tape image (for very long term retention) at the second site. I am going to use NetBackup terminology for the software, and EMC terminology for the hardware, but that is just for ease of reference. The issues here apply no matter if I am using NetBackup, TSM, NetWorker, or any other backup application. Some are marginally better. Some marginally worse. But they all have a big problem with this. Likewise, any vendor that sells you a deduplication appliance has the same fundamental issues: EMC, IBM, Data Domain, Sepaton, and so on all face this issue. (I excluded NetApp because they don't have a deduplication appliance for backup...)
So, what happens? The first site is my production site, where my backup master server and media server sits, as well as the data that I am backing up. I will call it Site A. I also have a deduplication appliance there that all my backup goes to, but no tape. The second site is the disaster recovery site. It has a deduplication appliance, a media server, and some tape drives. This is Site B.
For the backup of site A:
- All systems local to site A get backed up by media server(s) at Site A to the EDL3D. The retention and expiration of data on the DL3D is managed by NetBackup (NBU) just like it would be with physical tape. All data on DL3D is replicated to DL3D at site B by the DL3D.
So far, so good. But here is the kicker: how do I access the data at the replica target? How do I use it for restores? How do I move it to tape, if need be?
Generally, I can't. Because the virtual tape volumes at Site A and at Site B are identical. And they are "owned" by the media server at Site A. I can't just mount them in a media server at B and Vault them to tape. And in the event of a genuine disaster at site A, how do I get the catalog (database) to site B?
So what needs to be done? A couple things:
- A disaster recovery plan is required so that, in the event of a complete outage at A, the replicated virtual tapes from A can be mounted by a media server at site B (this would essentially be equivalent to "checking in" the media at site B if they were physical drives). This is actually the easier of the two things to do (the harder one is below). Basically, all I need to do is replicate the NBU catalog, either by putting it on a disk array and using array level replication, or backing it up to a virtual tape on the deduplication appliance and letting it replicate. If I take this approach, I need to be able to identify this virtual tape at site B, restore it to a new master server at Site B, and then I can begin general restores. Clearly, array level replication of the catalog offers me a better SLA!
- For situations other than a disaster, when I just want to access the virtual tapes at site B (either to restore or to Vault to tape), the only real alternative is a kludge. It works like this: after doing backup at site A, a NBU Vault job is run that creates a second copy of the backup media on the local EDL3D at site A. We now have two copies of our backups--each copy has a different virtual cartridge volume #. One copy will reside on a virtual library that has no replication policy, and the other copy resides on a virtual library that is replicated to site B. Now we "just" need a script that would identify the replicated volumes, unmount them (check them out) at Site A, and mount them (check them in) at Site B--where they could then be Vaulted to physical tape, or used to do a restore (for QA or dev purposes, for example).
Seems like kind of an ugly process, right? I think it is. This isn't just me trying to make something seem more complicated just to make a point. In fact, if anybody has a better way of doing this, please let me know and I will be happy to post it up here with a new post all of its own. This is a big enough issue that if there is a better way it should be shared as widely as possible. And let me repeat just for emphasis: this same issue applies irrespective of whether you are a NBU user, a NetWorker user, or a TSM user (TSM is a little worse, if anything...)
Until that time, what do I want? I want the software and hardware vendors to agree to a system that would include:
- Backup catalog replication
- Replication of virtual volumes with unique volume ID's at the source and target
- Will work with either backup to disk or virtual tape
- Lets me restore/copy/vault backup images at either source or target site independently and simultaneously
- Works within a single domain, or across backup domains (zones)
- Is automated, simple, and easy to administrate
- Is deterministic in behaviour and scheduling--it I tell it to do replication or vaulting at a particular time of the day, I want it to happen at that time of day. I don't want to assign it a service tier and let the backup application decide when to service that tier.
Wishful thinking? For now, maybe. But it should be what we are all striving for.
Folks, you have your work cut out for you.