One of the comments that I used to be able to make fairly universally two or three years ago (before virtual tape adoption was widespread) was that your backup is your archive. Typically, unless you had a strong legal, regulatory, or compliance requirement, your tape backup was your archive. And your archive was your tape backup.
This is not a good thing.
Backup and archiving have two wildly different use cases. They store different types of data. They require different retention periods. They have very different characteristics.
One of the good things about backup to disk, irrespective of whether it is virtual tape or disk directly addressed by the backup application, is that we get to rethink backup and archiving. That rethinking usually lets organizations think about what is different between backup and archiving; it is also an opportunity to re-architect, to have two different systems for the two different requirements. After all, when I am down-sizing my tape environment (see footnote) it is a really good time to have the discussion about what I really need to replace it, what my technical and business requirements are, and what I should do to address them.
Unfortunately, somebody over at Data Domain seems to have come to the conclusion that it makes sense to replace a single, monolithic tape based backup and archiving strategy, with a single, monolithic, disk based backup and archiving strategy. You can see the details here. Basically, do exactly the same thing as you were doing before, but do it to disk--all enabled by the addition of one feature: retention locking.
There are two basic reasons why I think this is probably a bad idea.
First, however, let me explicitly state some of the differences between backup and archiving just to give my comments some context.
Backup is all about operational recovery. It is about the data we need if something goes sideways, through accident or malice, to get us back in production. It is dynamic data. It is always changing. (And we get a copy of that change through backup at least once a day.) And it is a copy. The primary instance of the data still sits somewhere else--probably on a production storage array.
Archive is not that at all. Archive data is, by nature, static data. It doesn't change. It is (hopefully) the primary instance of that data--when I archive something and leave a stub behind, I am moving the primary copy of that data to some other place, some place off of my production storage array. Archive data is accessed less often. And it may or may not carry some pretty heavy regulatory and compliance issues with it.
Most importantly however, once I archive a piece of data, I shouldn't have to back it up. For everybody that doesn't have heavy compliance issues around their archive, this is a major component of the ROI on archive: if I archive something to the right kind of storage, then I don't have to back it up. That saves me on infrastructure, backup windows, backup storage, administrative effort, restore times, and generally results in a bunch of good things for my backup environment.
What is the right kind of storage?
It should offer guaranteed authenticity. The object you get out is guaranteed to be the object you put in. It should offer low or no administrative effort. It should offer replication. It should allow different retention rules for different types of data. It should support a high level of data integrity. It shouldn't allow you to delete something no matter who asks you to.
And not just incidentally, it should support XAM as well. Chuck has a lot to say about why this is important and cool. (Unfortunately, Data Domain doesn't do this either.)
It should come as no surprise that the EMC Centera fits all these requirements.
Nor, that Data Domain misses the mark on all of these. Most surprising to me was the notion that even with the Data Domain retention locking "a properaly authorized administrator can over-ride Retention Lock settings." So why bother? Do you think the guys who destroyed data at Enron would have hesitate to sign a big check over to the administrator here to give them the authorization to destroy data? No. Data retention shouldn't be an option, something that somebody, anybody, can dispense with just by having the right password. Data retention should be enforced.
What else do they get wrong? Well, it is still held on a DD filer, which means the data is still held in a file system, and subject to corruption and damage at the file system level. The data is still only RAID6. Centera protects data across nodes via parity or mirroring (and usually via replication as well), and doesn't have an exposed file system. Due to this, as well as a few other "under the covers" management routines, we refer the Centera as self healing and self managing.
These are some pretty significant and pretty basic differences. In combination, they are first reason I think the retention lock that Data Domain offers is a bit of a miss.
The second big reason revolves around my idea that once you archive something, you shouldn't back it up.
This means, when you archive something, you should single-instance the object at this point. Most good archiving software does this: EmailXtender and DiskXtender, for example, both single-instance files and attachments as they archive them. This capability is in the software, and has nothing to do with deduplication or hardware level single-instancing. In other words: I will only ever store one (logical) copy of this data anyway. Deduplication gains me nothing (there is no second, third, or fourth copy to deduplicate!). In fact in costs me much more than it gains me: not only am I storing my one instance of the data on deduplicated storage, which is more expensive than general purpose storage, but I get no benefit from the deduplication. I also get none of the other things that I want (like guaranteed authenticity, enforced retention, legal hold capability, etc.). I am paying, literally, something for nothing.
So, archive and backup: two different things. Two different sets of requirements. And there is very little point in trying to use your backup device as your archive as well.
Footnote: We had lunch with a customer last week. He had a different term for "downsizing" which was "career development day". As in this is a good opportunity for your career to develop in a different direction. I thought it was kind of funny, in a grim sort of way. The connection is that virtual tape is a lot like career development day for physical tape. Tape won't be dead, but it won't be still doing the same old same old either.
Comments