Mark (aka Storagezilla) Twomey wrote in his blog last week about copies and backups, and concluded that point in time copies are backups.
Hrmm. As I have said before, I am not sure that I agree with this.
I have discussed the issue here with W. Curtis Preston too, and I think there is a disconnect happening somewhere. I am fundamentally uneasy with the idea of calling a copy a backup. It seems to me that a copy is a necessary part of a backup, but not sufficient.
Would a tar ball be a backup? A gzip?
Again, I think the answer is, at best, "sort of".
And why do I differentiate between a copy and the tar ball?
Because I think there is not only a difference between copies and backups, but I think there is a difference between a backup and a backup and recovery system.
So lets start with fundamentals.
SNIA has this to say regarding the definition of a backup:
backup 1. [Data Recovery] A collection of data stored on (usually removable) non-volatile storage media for purposes of recovery in case the original copy of data is lost or becomes inaccessible; also called a backup copy. 2. [Data Recovery] The act of creating a backup. See archive.
(By way of an aside: does this mean a "backup" to flash drives is not a backup?)
SNIA also notes that: "To be useful for recovery, a backup must be made by copying the source data image when it is in a consistent state."
Which is a pretty important qualifier. Does a copy always do this? Well, no. Because there are a lot of ways of making a copy that could do so without ensuring consistency of the data set. So if and only if the copy is done in such a way as to generate a consistent state of (uncorrupted) data, you may have a backup.
But that backup may or may not be "good" in the qualitative sense. Not good, because if it is not generated as part of a repeatable, supportable, manageable, scaleable process, then its utility as a process is not very good. What is fine for one system is not so good when it needs to be done for 1000 systems. What is OK when done by one sys admin may not be so good if that sys admin is away and needs to be done by somebody else. What is acceptable to that sys admin may not be acceptable to the audit committee that wants to see and audit trail, reports, and proof of backup. And so on.
So a tar ball may be a backup, but it is not an acceptable part of a backup and recovery system, in my opinion.
And I think this is where some about of subjectivity creeps into the conversation. Because we have moved from having a very concrete definition of backup, to a more subjective discussion of a backup and recovery system. Both of the last two words necessitate this: recovery and system. Recovery, because we then have to ask some harder questions: should it be on the same disk array (presumably not, or not always)? should it have the same permissions and access levels as the source data (again, presumably no)? and, what is the likelihood that the data has been captured in such a way as to ensure recovery--without corruption?
"System" also implies subjectivity, around the quality and repeatability of the process. It means the data must be retrievable by the right, authorized people in a timely fashion (as specified in an SLA, probably) and we must have some means of verifying that the backup of a given system or data set is complete.
In the end, I think we end up with:
- A definition for backup
- A definition for backup and recovery process or system
- A set of mandatory requirements for a backup (included in the definition) and a set of desirable requirements (not included in the definition)
- A set of mandatory requirements for a backup and recovery system (included in the definition) and a set of desirable requirements (not included in the definition)
And it would be nice if we could define a backup and recovery system to everybody's satisfaction as we have defined backup. Pending further discussion here, that may be the subject of a future post!
Regarding: "(By way of an aside: does this mean a "backup" to flash drives is not a backup?)"
It's orthogonal. The definition doesn't restrict the media type (tape, disk, flash, ...). It justs notes the media is USUALLY removeable. Also note that definitions may change as industry consensus evolves. At some point, the use of B2D may exceed B2T and the parenthetical comment may need revision/deletion.
Posted by: Mike Dutch | September 25, 2009 at 12:40 PM
Scott - this is something that I spend most of my book either implicitly or explicitly discussing. In the intro I'm fairly blunt about it - too many companies think that by installing some backup software and hardware that they've installed a backup system . That's rarely the case, since the actual IT software/hardware is almost the least important part of a fully functional backup system. It's the human aspect (both within IT and without), the understanding of the business mappings to IT, the agreement on SLAs, etc., that all have to meld together (with that equipment/software) in order to produce a backup system.
Posted by: Preston de Guise | September 25, 2009 at 02:34 PM
Your suggest that you are going to discuss what about a copy makes it not a backup, but then you didn't actually make any points in that regard.
CDP is a copy with a log. Snapshots that are replicated are copies. I would consider both of these components of a backup system and are just as valid of a backup as a backup tape -- if not more so.
I do think that the backup system should be protected from hackers, but that's true of an old style and a new style backup system. People incorrectly assume that if it's a "copy" that someone will be able to delete it easier than something that was on a tape. Someone with intimate knowledge of the backup system could destroy either, which is why you have to protect against that person.
One final note. What NBU stores on EMC/Data Domain (via NFS/CIFS/OST) is essentially a tar ball. So I'd say that a tar ball is also an essential component of a backup system.
Preston is right. It's the system that makes it a backup, not the technology that got it there.
Posted by: W. Curtis Preston | September 28, 2009 at 05:47 PM
Curtis... "It's the system that makes it a backup..." Isn't that pretty much what I said? :)
A copy is a necessary but not sufficient component of a backup system.
Posted by: Scott Waterhouse | September 29, 2009 at 07:59 AM
I think the confusion lays in "backup" in the broadest, general term, and "backup" in the specific "instantiation of protection from a backup system".
I would suggest that narrowing the focus to just a single CDP instance would not satisfy the requirements of a "backup system", but yes, does satisfy a broad definition of the term "backup".
What the real crux of the matter is (IMHO) is educating companies and people to understand that a collection of random "backups" does not create a backup system.
I'd also suggest that the haziness around "backup" vs "backup system" does bring merit to my thoughts on pulling data protection activities out of ILM, and defining ILP - Information Lifecycle Protection. (http://bit.ly/bo1E4). Then it will at least be possible to agree that both CDP and backup form part of a total information lifecycle protection strategy.
Posted by: Preston de Guise | September 29, 2009 at 03:32 PM
Honestly, Scott, I wasn't sure what you were saying in this post. So I assumed that you were continuing your previous stance that was essentially (from memory) that a snapshot-only system (no matter how well managed and systemized) is not a valid backup, even if it's replicated.
Posted by: W. Curtis Preston | October 01, 2009 at 12:58 PM