« Backup Sucks: Reason #38 | Main | TSM Scalability (and the pot of gold at the end of the rainbow) »

August 25, 2009

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e550873cb688330120a57396d7970c

Listed below are links to weblogs that reference Something Doesn't Add Up (And Never Will):

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Andrew Storrs

Great points Scott. I'd fully agree with you that source based deduplication would have been a far more efficient solution.

At this point he's going to be stuck with a synthetic backup product (like NetBackup) and daily incrementals in combination with the Diligent dedup targets and output to tape.

While it might work within their time window, it's not as clean as Avamar would have been; nor is it likely to work at the block level - at least not without a more expensive client agent.

Andrew Burton

Scott, the article doesn't conclude that the user "had no choice but to consider backup to disk with deduplication in the form of a Diligent box." The story simply reports that this particular user chose a Dilligent box. Nowhere is it stated that this was his only option.

It also goes on to report that: "[The user] said he can meet his expected 50% annual growth in storage needs by buying another disk array every year and another server every two years. "If I'm wrong on [the amount of] data growth, I'll just escalate" those expansion plans, he said. Each expansion "is less than $20,000 a pop, a small enough amount of money so I can double, triple or even quadruple the size" of the VTL if needed, he added.

I think you are misrepresenting what the article says.

Scott Waterhouse

No, the article states that the other options didn't meet his objective (21 TB in 2 hours). But neither could the Diligent.

The article also states that "he can meet his expected ... growth ... by buying another disk array".

The second point is simply untrue. Just as 21 TB in 2 hours would require 4 Diligent systems, 32.5 TB in 2 hours would require 6.

So there are all kinds of alternatives here: the article is wrong. The user is wrong. The user had a bunch of requirements or criteria (of which the 21 TB in 2 hours was just one) which aren't described here and which also weight into the decision. Probably the latter is the most likely (at a guess).

But the article pretty clearly leads a reader to believe that the Diligent system let the user meet the only requirement described (21 TB/2 hr). And did so where others did not. Which is simply untrue--it can't. And if that is truly the (only) criteria, the only way I can think of to meet it is with source deduplication.

But as I said, I suspect there were other criteria. But in the absence of those the article is misleading.

W. Curtis Preston

You had me for MOST of your post. I agree that there's absolutely no way that this customer can meet his requirement with the solution he chose. To back up 21 TB in 2 hours, you need 5833 MB/s. There are target dedupe systems that can supply this kind of ingest rate (FalconStor & NEC), but IBM isn't one of them. But that's irrelevant. There's absolutely no way he's going to generate 5833 MB/s from a single filer, let alone a single volume, which is what he has. If he gets over 100 MB/s, I'll be impressed.

In addition, I agree with your latest comment that just adding additional storage isn't going to help. Growth also means he'll need a faster backup, and that's not going to happen, especially since his config is already so far off the track already.

Like I said, you had me for most of your post. Your wheels fell off (to use your words) for me when you said, "source deduplication, like Avamar, is generally the only way that you will achieve two critical goals." Then you followed that up with "the only real solution to this issue is to consider a source based deduplication product."

This user has a single 21 TB volume on a filer. While he didn't say specifically say NetApp, most end-users don't use that term unless they're NetApp customers. To use Avamar (or any other source dedupe system) with a filer, you back it up via NDMP, after which the dump stream is processed by Avamar. This means that first you're going to have to perform (and dedupe the first full backup), and I think that's going to take a really long time with a single stream which will (at best) be runnning at 100 MB/s -- probably slower. At 100 MB/s, it takes 58.33 hours to get that first full backup. Then every day you're going to need to run an incremental dump on that filer. I've never tested it, but I'm going to suggest that there's no way that an incremental dump on a 21 TB filer is finishing in 2 hours. In fact, I'll say it's not even going to get close. And you have to run that dumb ol' dump command to create the stream for Avamar to process.

Source dedupe works much better when it's actually run at the source. In this case, you're not running it at the source; you're processing someone else's dump stream. So I don't see how Avamar (or any source dedupe product) could possibly meet this guy's requirement.

If it's a NetApp (and I'll bet it is), this dude needs to buy SnapMirror or SnapVault. THAT'S the way to go back to the "source" and reduce the amount of data transferred from the very start. Trying to back up a 21 TB volume in ANY way to ANY device via ANY method that involves dump is insanity by definition. You're using a backup product (dump) that was designed for filesystems that were under 4 GB and hasn't changed much since. SnapMirror and SnapVault are da bomb, though.

But, I'll agree with you that the story makes no sense.

Scott Waterhouse

Curtis;

Great points. You are somewhat off on the speed of Avamar--and that can be augmented further by running multiple streams. And every backup after the first is an NDMP level 1--so it gets faster still.

Having said all that, you are probably right that even with all that 2 hours is still too ambitious. Although you would no doubt be an order of magnitude closer to the goal this way than with target dedup.

With respect to SM and SV--they aren't backup. They are copies. At least that is my perspective. Now a copy may (or may not) be able to meet your data protection requirements. But the difference should be understood carefully first before you decide that a copy is good enough. In general, I dislike labeling them backup because they have such different characteristics.

But, at the end of the day, the whole thing did not pass the sniff test. Something was ommitted, or overlooked. Something important. Because as described I can't make sense of it. And if neither of us can... Well then. :)

W. Curtis Preston

I wasn't talking about the speed of Avamar; I was talking about the speed of the dump, and Avamar can never go faster than that. And, from what I've seen, you're not going to gain much performance by firing off multiple sessions on a single volume. (You'll gain some, but it's not like one session is 100 MB/s and 5 sessions is 500 MB/s. In addition, using multiple sessions requires manual creation of include lists instead of just filer:/vm1. Yuck) I haven't benchmarked it in a while, but I was actually being generous at 100 MB/s. (Anyone care to chime in with up to date numbers on how fast a dump can go on current NetApp hardware?) And if the filer is not going to supply data any faster than 100 MB/s, then Avamar cannot write it any faster than that, either.

As to Avamar being any better in this situation, I don't see it. Anybody using NDMP on this size of a filer is using direct attached tape or virtual tape, and any good virtual drive is going to have no problem keeping up with what the filer is pushing out. So I don't see how Avamar would be any faster/better in this situation. In fact, Avamar could actually slow down the backup because the Avamar backup (I believe) will be sent across IP, and the competing solution will go across FC. Give me FC over IP any day of the week -- especially on a busy filer.

Avamar has some great use cases. I don't agree that this is an area where it can help.

As to SV or SM not being a backup... What makes dump, tar, or Avamar (I rhymed) a backup and SV/SM not a backup? They all meet the operational requirements of backup and restore, so they are backups. CDP is also backup. And SV/SM are basically near-CDP (snapshots and replication). Many NetApp customers went tapeless years ago by using SV/SM as their sole backup solution both on and offsite.

Scott Waterhouse

What makes Avamar better is you never have to do another full. Now, to be honest, I don't know if that is sufficient (even with all the tricks I mentioned) to meet the objective. But it is better than any other backup.

As to SV and SM not being a backup? I wrote about it a bit here: http://thebackupblog.typepad.com/thebackupblog/2009/04/is-a-copy-a-backup.html

I give that some thought every now and then. There are some things about SV and SM that leave me deeply uneasy as a backup guy (my intuitive feeling is they are "different" and "not backup"). That post was my attempt to wrestle with that and other similar approaches. I am not sure I got it 100%, but I wanted to try to deal with the issues that are the basis for my intuition.

W. Curtis Preston

I must have missed that article. SnapVault meets all of your requirements. It's stored on a different array. It's managed by a backup app (NetApp's DPM), and is stored in a different format than the original (SnapVault does change the format slightly). AND, of course, you can get it offsite.

Having said that, I don't see how changing the format is part of the definition of a backup. What exactly is accomplished by changing the format? In addition to Mozy, I use xcopy to occasionally copy my hard drive to a local disk drive. Just because I didn't use NTbackup or tar, that's not a backup?

Scott Waterhouse

Curtis... As I said I am not totally happy with the line of reasoning. Maybe one of the things that is bothering me is the difference between a backup, and good data protection. Perhaps tar is a backup, but it is not (in my opinion) good data protection. I think SV has the same issues (from the outside--never having attempted to manage a large environments' data protection with SV and OSV).

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

Search The Backup Blog

  • Search

    WWW
    thebackupblog

Disclaimer and Copyright

  • Copyright Notice
    All material on this blog is copyright Scott Waterhouse. ©2008 - 2009 Scott Waterhouse
  • The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.