Chris Mellor wrote an interesting piece over at El Reg recently, introducing the proposition that CDP could render backup obsolete. First, welcome to the new digs Chris, I am sure Blocks and Files will miss you. Now that you are at The Register, I am expecting a bigger serving than ever of wit and irony from your keyboard.
The premise of the article was that "Continuous data protection could render dedupe, virtual tape libraries (VTL) and backup software redundant." And just for accuracy, I will note that it was really Alexander Delcayre of Falconstor that was making the claim. Irrespective of the source of the claim, however, we can ask: is it true? Can we really throw away (in some cases, with a considerable sigh of relief) all that tape, all those traditional backup applications, and all the headaches and stress that accompanies backup?
Well, perhaps... But probably not.
First, what is CDP? Chris' definition is pretty good: it is a journal of every write made by a system to disk. CDP has granularity down to the block level when capturing changes, and because it captures all writes, at any point in time, it can permit recovery to any point as well (often to a consistent state, often with no requirement to reference application logs).
Generally, it is pretty cool technology. However, it hasn't exactly experienced overwhelming acceptance in the marketplace. But how can it replace tape and traditional backup?
Well, the first issue is: how much storage do I need? The article implies that CDP requires no more storage than backup with deduplication, and that is just not the case. CDP requires more. Sometimes much more. Because CDP captures every write, it will capture much more than something that only captures changes once every 24 hours, like a standard backup.
For example (and I am going to use highly unrealistic numbers just to make the point), lets say that we have a 1 TB database composed of 1000 1 GB chunks. 100 of these chunks change 1/day. 10 of these chunks change 100 times per day.
In this scheme:
- A traditional backup will capture 1 TB once per day.
- A deduplicated backup will capture 110 GB once per day.
- A CDP system will capture 1.1 TB per day.
Now that is a lot of change--so you may not end up with your CDP system being that large relative to the other two approaches. But it will certainly be larger than the deduplicated backup. In fact, we can make the following generalization: things which cause deduplication to degrade (smaller storage savings) are the same things which cause CDP storage to inflate.
So, if you have very high data change rates your CDP will be bigger. If you retain data for a very long period of time, your CDP storage will be bigger. And if you have a high degree of commonality within a given backup (at the segment level--as might be common with Avamar, for example) this will be a good thing that means your deduplication storage pool can be smaller, but it will do nothing for CDP.
Having said all that, there is a critical component to this discussion which can render all the above irrelevant: what is your RPO and RTO? Because backup with deduplication will have no greater frequency of backup than backup to tape (no deduplication), it still has the same RPO as backup to tape. If I backup once per day, I can't expect to be able to recover to any of several times during that day. My best case RPO is always my last backup (barring application of application transaction and recovery logs). With CDP, my RPO is effectively anything I want--any second of the day, any transaction point.
So we can really make another generalization:
- Highest cost and highest RPO/RTO: Tier 1 storage with SRDF (or equivalent).
- Mid-point RPO/RTO and mid-point cost: CDP (RPO is just as good as SRDF, but RTO to full function and performance is not)
- Lowest RPO/RTO and lowest cost: backup with deduplication
At the end of the day, it is all about Service Level Agreements. SLAs should dictate the solution, and the budget that will pay for it. Image a sweaty Steve Balmer screaming "SLAs, SLAs, SLAs!" over and over again and you have the right idea. If you don't start any backup conversation from there, particularly a strategic data protection conversation like any discussion of CDP and backup, then you are off on the wrong foot!
And by the way, as an editorial footnote, there is nothing in principle that would prevent CDP and traditional backup and dedup from merging. All it would take:
- Deduplication across different CDP protection pools.
- Stronger protection of CDP data (meaning, for example, not just RAID-6, but parity protection across RAID-6 groups--look at how Avamar protects its data for a better idea of what I mean here).
- And the ability to switch from capturing/protecting every transaction to only significant points in time or significant transactions as the data ages. For the first month I might capture every transaction, for the next 5 months I might only have a daily image which would discard 99 of the 100 changes to those 10 chunks that so often in my example above.
And no this should not be taken as some sort of Maui like teaser of things to come from EMC. This is just me thinking about what would be required to merge a CDP appliance with a backup appliance to be both useful and cost effective.
I agree with your overall premise. CDP's not replacing backup anywhere that I know of, nor do I see that happening any time soon -- actually for different reasons than what you stated. My reasons have to do with it just being too different than what people are used to. Dedupe hardware is a nice enhancement to the backup software I'm using, and dedupe software is a different software that works on the surface just like my old backup software, but does cool things under the surface. (It still does nightly backups, etc.) But CDP is just, well, weird. ;) It's very different and a much bigger pill to swallow. I'm not saying it's not better. I'll compare it to teleportation. If such a thing existed, then it would be way cooler than cars or planes. But it would be new and scary. CDP is teleportation to me.
I do not beieve that CDP will necessarily be the same or larger than traditional backup. Your math is fine, but it only works if you have records that are being updated 100s of times a day. I'm not sure that's a normal use case. The second issue is that it doesn't take into account the fact that CDP is block based, and backup is typically record and file-based. While only 1% or less of blocks typically change per day, a typical incremental backup is 10% or more. That's because it backs up files and records that have changed, not the blocks. This is where a lot of the dedupe savings come from. I'm not saying that CDP will always be less, but I disagree that there is enough evidence to say that it will always be more. I would need more evidence to believe that.
I also disagree that dedupe can't change your RPO. If your backup system can handle quick backups, then you can absolutely back up more often each day than you did before. If you're replicating (since you're deduping), you can even decrease your DR RPO.
Finally, the CDP product that EMC acquired (Kascha) already has the ability you're describing. You can start out capturing everything, then drop to significant points in time.
Good post.
Posted by: W. Curtis Preston | September 17, 2008 at 03:29 PM
You know, I was chatting internally with someone about why Oracle dbas are less likely to embrace deduplication than other people. My conclusion was that they are very conservative. Sometimes I wonder if it isn't just teleportation that we (backup people) end up rejecting. I think it is many forms of change. To extend the transportation metaphor: would we even embrace a hybrid? Or is the old gas guzzler "good enough"?
But I couldn't agree more. CDP is just too much of a change of paradigm. I think things need to be more obviously broken before we embrace change. Tape is relatively broken, and that is why virtual tape and deduplication has been so successful.
And your reasoning about capacities is correct. I was just trying to highlight that CDP will require more storage than dedup--how much more depends on the rate of change. It will definitely be less than traditional backup.
Which means that it won't be capacity savings that will incent people to move to CDP. There has to be some other radical or unique value proposition that CDP offers. Which could also well be some dramatic failure on the part of traditional backup applications to offer what users want, or what is necessary. (One of the reasons Avamar has been so successful, in my opinion.)
For those that are interested, information on EMC's CDP product can be found here: http://www.emc.com/products/detail/software/recoverpoint.htm
Posted by: Scott Waterhouse | September 18, 2008 at 08:29 AM
Interesting article but clearly you have not implemented or thoroughly evaluated a true CDP solution. I have several clients that utilize CDP and currently selling Falconstor product after internal bake-offs. The key with CDP is that changes are recorded at the block level or in the case of Falconstor's CDP the sector level. This minimizes impact on the disk and more importantly the network in which backup is taking place compared to a traditional backup. So not only is disk space being saved but backups occupy less production time and require fewer CPU cycles than traditional backup.
Most CIOs would agree that improving backup times, RPO, and RTO are way more critical than saving 10% on disk space especially if that disk can be tier 2 disks. In an optimized CDP environment you can actually journal IO changes to fast disks and sequentially dump them to a second tier disk. This is a great way to save money but also must be considered when looking at a recovery.
Secondly all of the true CDP products I have evaluated have a scheduling system (some better than others) to span out recovery points. I have customers maintaining multiple years for an archival purpose while keeping hourly and daily points in time in place for more immediate recovery needs. I assure replacing backup with CDP will not multiply your disk requirements anywhere remotely close to your quick thought out demonstration above if deployed correctly.
Lastly, I agree as well that tape is not out and that is another reason we side with one vendor for CDP is the ability to archive backups to tape while still reaping the benefits of immediate recovery, bare metal recovery, and minimized disk occupation.
Thanks,
Jeff
Posted by: Jeff | April 18, 2009 at 12:03 AM