Barry Burke may want to riff on Britney Spears in his latest post, but I think I will channel Lucinda Williams instead. Besides being two or three decades older than Ms. Spears, Lucinda is also gifted with wisdom and passion, qualities sadly lacking in little Miss Train-Wreck-In-Slow-Motion-Spears. So, it is Lucinda's words that title this post.
And, I like to think that it is time for a little Peace, Love and Revolution in the world of backup and recovery. Mostly revolution though. The peace and love can come as we, as an industry, work to make things better.
The revolution we need is a re-think of our backup applications.
Why?
Because backup applications are not keeping up with what the hardware is capable of. I think that backup applications just haven't done a good job of maximizing the potential of virtual tape or deduplication appliances. Actually, let's be honest: they have done a terrible job of it.
I could go on and on about how virtual tape is better than physical tape, and aside from the issue of cost, I don't think I would get much argument from anybody. It is more reliable, faster to backup to, much faster to recover from, and more flexible.
Deduplication offers some of these benefits too. But it does two more things: it gives us an extremely cost effective way of storing very large amount of backup data, and it gives us an extremely cost effective and low bandwidth way of replicating data to a second site.
However, virtual tape is used in almost exactly the same way as physical tape is. Same processes and procedures. Same concept. I would argue that this is one of the reasons it has been so successful: if you don't want to, you don't have to change much of anything when you move to virtual tape, and your backup environment will just be better.
Fine. I guess I can forgive the application vendors if the applications don't treat virtual tape differently than physical tape, in the main.
But if this happens with deduplication systems, it will be really unfortunate.
Why do we have to work with the same processes and procedures, the same old operational same old, the same architectures, with deduplication systems?
Because backup applications insist on treating them this way.
Why can't I replicate data via the appliance and have my backup application recognize what just happened?
Why can't I create a second copy of the backup data with a different retention period that is strictly pointer-based? (Instead, I have to "dupe" or "vault" data which involves reading it from the appliance, through a media server, and back out to the same or a different device.)
Why can't appliances write a physical tape if required without a lot of intervention or intermediation from the application, and then let the backup application know when it is done?
Let me pick on the first one here in detail. At the moment, when I use deduplication, I have two choices on how to replicate my data to a second site.
And both of them are bad.
The first choice is that I let my backup application do the replication. This is good because it means by application now "knows" that I have a second copy of the data at a second location. But it is bad, because as soon as I replicate data with the application, I have to rehydrate the data before I transmit it to the second site. This means that I lose all the bandwidth savings offered by deduplication. That is really, really bad.
The second choice is that I use by deduplication appliance to replicate the data. So far, so good. However, when I do that, I end up with a second virtual tape at the target set with the same virtual bar code as the tapes at the primary site. And, to make matters worse, my backup application doesn't even know the second copy exists. Which makes it almost impossible to utilize in an ongoing operational process (such as doing restores for QA or test/dev databases) and very difficult to use in a disaster recovery process. This is really, really bad too.
Why can't my backup application either instruct the appliance to replicate, and make an entry into the catalog of retained data? Or, why can't my appliance notify the backup application of the second, remote copy?
And how do I coordinate this with the database used by the backup application to hold its catalog of data protected, virtual and physical tapes, and so on?
Answer: at the moment, I can't.
More precise answer: I can, but it is an enormous pain, and it means constructing something like consistency groups that involve both the disk which stores the backup application database, and the disk which holds either the virtual tape, or the deduplicated data.
Which is crazy difficult, expensive, and explains why almost nobody does this.
There has to be a better way. There is a better way. But we need the backup application vendors to step up. We need them to admit that the hardware is now capable of much more than the backup applications. We need them to realize that we can do things better.
Can you imagine how easy backup would be if I could backup up to a virtual tape appliance. Deduplicate the data. Have my backup application instruct the appliance to replicate. Transmit the deduplicated data (only). And instruct a remote backup server that it now has a copy of the data, to do with as it wishes. And informs it of the appropriate database and catalog entries for that data. And checkpoints them with the replicated data.
There is a synergy in making deduplication and virtual tape work in harmony with the backup application. We, as an industry, and we, as backup administrators, operators, and architects, NEED that synergy.
We need the revolution. We need things to be better.
It is not every day that a real opportunity to make things better comes along. Virtual tape and deduplication have given backup application developers that opportunity.
To waste the opportunity would be a tremendous shame.
So it is time for a little out of the box thinking. It is time to make the world of backup a better place.
It is time for a little peace, love and revolution.
P.S. This is as vendor agnostic a post as I will ever make. My plea goes out to each and every backup application vendor: EMC, IBM, Symantec, and all the rest of them. As far as I know, none of them comes remotely close to getting it right. And it is about time that somebody did.
You might want to look at OpenStorage (NetBackup). That is pretty close to what you are looking for...
Posted by: Jenny Smythe | April 17, 2008 at 09:51 AM
Jenny;
This is true, to a certain (limited) extent. However, without wanting to be too contrarian, it is still more of an idea than a final stable implementation. I am also a little unhappy with the additional "layer" in the architecture that it introduces, and feel that there still needs to be greater integration of deduplication appliances (which we could argue is at least partly the responsibility of the appliance vendors). There are also big issues with the approach when more than one NBU Master server is involved.
At the end of the day, I will make one other observation: innovation has been happening much faster in backup hardware and appliances than software. I am hoping that the backup application vendors recognize this, and step up! (Innovation in this context means useful, practical features that makes our lives either easier, cheaper, or both!)
Posted by: Scott | April 17, 2008 at 11:06 AM
I had a comment, but it seemed kind of long, so I posted it on the NetBackup Blog.
https://forums.symantec.com/blog?blog.id=NetBackup
Posted by: tim | April 17, 2008 at 11:46 AM
Tim, appreciate the comments, and the spirit of the coopetition. Since you brought it up (!), I do think OpenStorage is a step in the right direction. However, as I indicated to Jenny, I have concerns:
- It is for disk (as NAS/CIFS) only as I understand? Correct me if I am wrong. I would love to see equivalent functionality for VTL. For a lot of reasons, some of which I have written about here, I prefer VTL to disk, all things being equal, and think VTL has significant advantages over straight disk in both the SMB space and the enterprise. It is not black and white, but there are some pretty compelling reasons that VTLs work better for most people most of the time.
- It doesn't resolve the issue (again, as far as I know) if I have more than one master, or a different master at each site--which is often the case.
- I struggle with the architectural complexity of multiple layers of Media Servers, in the sense that I think this is a step in the wrong direction (adding complexity to the environment).
- EMC is a member (participant? Not sure of the right word here) of the OpenStorage program. Clearly we do see the value in it, and intend to deliver products leveraging it. However, my reaction, and in no way representative of any official EMC line, is that it is only a first step, that it needs broader participation from both vendors (i.e. more than just DataDomain delivering product) and customers (I personally have observed a very low uptake on it in the NetBackup install base) to be interesting, and that equivalent or near equivalent functionality is offered by Networker with its file type and advanced file type devices.
Call me cranky, but none of it is good enough. We could all push innovation much further. More automation, more support for different sorts of devices, more flexibility, and more support for the way business really does backup in terms of process, procedures, and architectures.
The future is a fun place to work, but if we don't spend enough time in the present, then we are just dreaming. The point of revolution is not just to think big, but do big.
Posted by: Scott | April 17, 2008 at 12:44 PM