When considering what to do to upgrade your backup system, I have always maintained that one of the most important questions you should be asking is: what is wrong with what I got? (Proving nothing more than that the English language will be the first victim of this post.)
What is wrong can have many answers. It is not reliable. It breaks. It is slow. It costs too much. And so on. But one of the very frequent answers is this: my backup takes too long.
As I have said many times before, there are a lot of things that can cause a backup to bottleneck. And there are a lot of things that you can make infinitely fast and your backup will still be no faster overall (yes, LTO3 and LTO4, I am talking about you.)
And until we understand the big picture, and which components in that big picture are contributing to a slow backup, it is a little irresponsible for any of us on the vendor side of things to come out and claim that disk will "fix" your backup and make it faster. It might. Or it might not.
So when I read this piece on Search Storage I couldn't help but think it didn't make a lot of sense, and that had the claims come from a vendor, I would consider them suspicious best, and irresponsible at worst.
The article starts out with entirely the right approach: "Buying the right virtual tape library (VTL) depends on a user's existing data backup environment, the amount of data to be backed up, the complexity of the storage environment and, most importantly, what problems they're trying to solve."
It then describes the customer's number one problem: backup windows. He has 21 TB to back up in 2 hours. And the primary criteria for selecting a solution was that it be a single box.
Finally, the article concludes that the customer had no choice but to consider backup to disk with deduplication in the form of a Diligent box.
And that is where the logic wheels came right off the wagon.
Because backup to disk with target deduplication will do *nothing* to help this customer meet their backup window (with a single box). It may be better than tape--depending on how many tape drives they have. It may even offer sufficient performance such that it is no longer the primary bottleneck.
But given the performance of the current generation of IBM/Diligent product (about 3.2 TB/hr for the very best--most expensive--configuration) they simply cannot meet the objective with the solution described. They would need four of them.
The only way to meet this particular requirement is to go to a source deduplication solution (or buy more than one target solution). Source deduplication, like Avamar, is generally the only way that you will achieve two critical goals:
- Reduce the time to complete a backup. In our experience, Avamar will reduce the time to run a backup on a given client by as much as 90%. So, assuming the backup client is the bottleneck, this may be a way of shortening your backup window.
- Reduce the amount of data sent over the network and stored on the backup server. Without reducing network utilization you still may not be able to meet an aggressive backup window like that described in the article. If you had a backup server with a 10 Gb network connection, you still would not have sufficient bandwidth to backup 21 TB in 2 hours.
So given the problem, the solution posed simply won't work.
I would even go so far as to make the broader claim that no target solution will work here. Why? Because if we assume that the 21 TB is composed of a typical mix of file, print, database, and application data, I can pretty much guarantee that one of the the many systems that hosts this data will not be able to complete their backup in 2 hours. Again, the only real resolution to this issue is to consider a source based deduplication product. However, in this case, and I think this is important, the deduplication is secondary to the fact that Avamar's methodology inherently offers significant reductions in the time required to complete a backup.
Without that reduction, and given the premises of the discussion provided in the article, I see no other alternatives that would allow the client to meet their objective of backing up 21 TB in 2 hours.
Great points Scott. I'd fully agree with you that source based deduplication would have been a far more efficient solution.
At this point he's going to be stuck with a synthetic backup product (like NetBackup) and daily incrementals in combination with the Diligent dedup targets and output to tape.
While it might work within their time window, it's not as clean as Avamar would have been; nor is it likely to work at the block level - at least not without a more expensive client agent.
Posted by: Andrew Storrs | August 25, 2009 at 11:11 PM
Scott, the article doesn't conclude that the user "had no choice but to consider backup to disk with deduplication in the form of a Diligent box." The story simply reports that this particular user chose a Dilligent box. Nowhere is it stated that this was his only option.
It also goes on to report that: "[The user] said he can meet his expected 50% annual growth in storage needs by buying another disk array every year and another server every two years. "If I'm wrong on [the amount of] data growth, I'll just escalate" those expansion plans, he said. Each expansion "is less than $20,000 a pop, a small enough amount of money so I can double, triple or even quadruple the size" of the VTL if needed, he added.
I think you are misrepresenting what the article says.
Posted by: Andrew Burton | August 26, 2009 at 01:18 PM
No, the article states that the other options didn't meet his objective (21 TB in 2 hours). But neither could the Diligent.
The article also states that "he can meet his expected ... growth ... by buying another disk array".
The second point is simply untrue. Just as 21 TB in 2 hours would require 4 Diligent systems, 32.5 TB in 2 hours would require 6.
So there are all kinds of alternatives here: the article is wrong. The user is wrong. The user had a bunch of requirements or criteria (of which the 21 TB in 2 hours was just one) which aren't described here and which also weight into the decision. Probably the latter is the most likely (at a guess).
But the article pretty clearly leads a reader to believe that the Diligent system let the user meet the only requirement described (21 TB/2 hr). And did so where others did not. Which is simply untrue--it can't. And if that is truly the (only) criteria, the only way I can think of to meet it is with source deduplication.
But as I said, I suspect there were other criteria. But in the absence of those the article is misleading.
Posted by: Scott Waterhouse | August 26, 2009 at 02:27 PM
You had me for MOST of your post. I agree that there's absolutely no way that this customer can meet his requirement with the solution he chose. To back up 21 TB in 2 hours, you need 5833 MB/s. There are target dedupe systems that can supply this kind of ingest rate (FalconStor & NEC), but IBM isn't one of them. But that's irrelevant. There's absolutely no way he's going to generate 5833 MB/s from a single filer, let alone a single volume, which is what he has. If he gets over 100 MB/s, I'll be impressed.
In addition, I agree with your latest comment that just adding additional storage isn't going to help. Growth also means he'll need a faster backup, and that's not going to happen, especially since his config is already so far off the track already.
Like I said, you had me for most of your post. Your wheels fell off (to use your words) for me when you said, "source deduplication, like Avamar, is generally the only way that you will achieve two critical goals." Then you followed that up with "the only real solution to this issue is to consider a source based deduplication product."
This user has a single 21 TB volume on a filer. While he didn't say specifically say NetApp, most end-users don't use that term unless they're NetApp customers. To use Avamar (or any other source dedupe system) with a filer, you back it up via NDMP, after which the dump stream is processed by Avamar. This means that first you're going to have to perform (and dedupe the first full backup), and I think that's going to take a really long time with a single stream which will (at best) be runnning at 100 MB/s -- probably slower. At 100 MB/s, it takes 58.33 hours to get that first full backup. Then every day you're going to need to run an incremental dump on that filer. I've never tested it, but I'm going to suggest that there's no way that an incremental dump on a 21 TB filer is finishing in 2 hours. In fact, I'll say it's not even going to get close. And you have to run that dumb ol' dump command to create the stream for Avamar to process.
Source dedupe works much better when it's actually run at the source. In this case, you're not running it at the source; you're processing someone else's dump stream. So I don't see how Avamar (or any source dedupe product) could possibly meet this guy's requirement.
If it's a NetApp (and I'll bet it is), this dude needs to buy SnapMirror or SnapVault. THAT'S the way to go back to the "source" and reduce the amount of data transferred from the very start. Trying to back up a 21 TB volume in ANY way to ANY device via ANY method that involves dump is insanity by definition. You're using a backup product (dump) that was designed for filesystems that were under 4 GB and hasn't changed much since. SnapMirror and SnapVault are da bomb, though.
But, I'll agree with you that the story makes no sense.
Posted by: W. Curtis Preston | August 27, 2009 at 03:15 PM
Curtis;
Great points. You are somewhat off on the speed of Avamar--and that can be augmented further by running multiple streams. And every backup after the first is an NDMP level 1--so it gets faster still.
Having said all that, you are probably right that even with all that 2 hours is still too ambitious. Although you would no doubt be an order of magnitude closer to the goal this way than with target dedup.
With respect to SM and SV--they aren't backup. They are copies. At least that is my perspective. Now a copy may (or may not) be able to meet your data protection requirements. But the difference should be understood carefully first before you decide that a copy is good enough. In general, I dislike labeling them backup because they have such different characteristics.
But, at the end of the day, the whole thing did not pass the sniff test. Something was ommitted, or overlooked. Something important. Because as described I can't make sense of it. And if neither of us can... Well then. :)
Posted by: Scott Waterhouse | August 27, 2009 at 03:57 PM
I wasn't talking about the speed of Avamar; I was talking about the speed of the dump, and Avamar can never go faster than that. And, from what I've seen, you're not going to gain much performance by firing off multiple sessions on a single volume. (You'll gain some, but it's not like one session is 100 MB/s and 5 sessions is 500 MB/s. In addition, using multiple sessions requires manual creation of include lists instead of just filer:/vm1. Yuck) I haven't benchmarked it in a while, but I was actually being generous at 100 MB/s. (Anyone care to chime in with up to date numbers on how fast a dump can go on current NetApp hardware?) And if the filer is not going to supply data any faster than 100 MB/s, then Avamar cannot write it any faster than that, either.
As to Avamar being any better in this situation, I don't see it. Anybody using NDMP on this size of a filer is using direct attached tape or virtual tape, and any good virtual drive is going to have no problem keeping up with what the filer is pushing out. So I don't see how Avamar would be any faster/better in this situation. In fact, Avamar could actually slow down the backup because the Avamar backup (I believe) will be sent across IP, and the competing solution will go across FC. Give me FC over IP any day of the week -- especially on a busy filer.
Avamar has some great use cases. I don't agree that this is an area where it can help.
As to SV or SM not being a backup... What makes dump, tar, or Avamar (I rhymed) a backup and SV/SM not a backup? They all meet the operational requirements of backup and restore, so they are backups. CDP is also backup. And SV/SM are basically near-CDP (snapshots and replication). Many NetApp customers went tapeless years ago by using SV/SM as their sole backup solution both on and offsite.
Posted by: W. Curtis Preston | August 27, 2009 at 07:14 PM
What makes Avamar better is you never have to do another full. Now, to be honest, I don't know if that is sufficient (even with all the tricks I mentioned) to meet the objective. But it is better than any other backup.
As to SV and SM not being a backup? I wrote about it a bit here: http://thebackupblog.typepad.com/thebackupblog/2009/04/is-a-copy-a-backup.html
I give that some thought every now and then. There are some things about SV and SM that leave me deeply uneasy as a backup guy (my intuitive feeling is they are "different" and "not backup"). That post was my attempt to wrestle with that and other similar approaches. I am not sure I got it 100%, but I wanted to try to deal with the issues that are the basis for my intuition.
Posted by: Scott Waterhouse | August 28, 2009 at 05:23 PM
I must have missed that article. SnapVault meets all of your requirements. It's stored on a different array. It's managed by a backup app (NetApp's DPM), and is stored in a different format than the original (SnapVault does change the format slightly). AND, of course, you can get it offsite.
Having said that, I don't see how changing the format is part of the definition of a backup. What exactly is accomplished by changing the format? In addition to Mozy, I use xcopy to occasionally copy my hard drive to a local disk drive. Just because I didn't use NTbackup or tar, that's not a backup?
Posted by: W. Curtis Preston | August 31, 2009 at 10:30 AM
Curtis... As I said I am not totally happy with the line of reasoning. Maybe one of the things that is bothering me is the difference between a backup, and good data protection. Perhaps tar is a backup, but it is not (in my opinion) good data protection. I think SV has the same issues (from the outside--never having attempted to manage a large environments' data protection with SV and OSV).
Posted by: Scott Waterhouse | September 10, 2009 at 12:02 PM