« How Fast is Fast | Main | NetWorker Goes Virtual »

February 23, 2009


Feed You can follow this conversation by subscribing to the comment feed for this post.

W. Curtis Preston

I'm glad you like the idea. Here are my thoughts on what you asked.

1. The backup app should own it as it always has.

2. It should support both NAS & VTL. Yes, that makes it harder, but the biggest vendors won't participate if you don't.

3. As to additional metadata, tapes created by the process could/should be of two types: a tape that's usable by the backup app, and a tape that is only usable by the dedupe box. The former should contain no more or less metadata than any other tape created by the backup app, and the latter would be up to the disk target vendor.

4. In order for it to work, at least two ISVs and two OEMs have to participate, and it would help if they were big ones.

As to you not agreeing with my history, the only issue I saw you having was that I didn't name Legato as one of the original authors of NDMP. I already explained why I didn't do that, which includes the FACT that Legato actually fought the creation of NDMP until they eventually caved by buying PDC. I was there, dude. I remember trying to get NDMP for NetWorker back in the day, and being told by Legato how evil NDMP was, and how wonderful their java-based thing was. So forgive my reluctance to name them as on of the authors of the protocol.

Scott Waterhouse

It's OK, where is the love brother? :) At least we can agree on a few things this morning!

Jered Floyd

I agree some and disagree some. Deduplication is a continually developing technology, and locking vendors into a fixed set of partitioning tools will stifle a growing area -- the ability to squeeze more deduplication out of the same data is a significant advantage for some vendors. Some vendors, like EMC Centera, can only do single-instancing of full objects. Others, like NetApp, are bound to 4K fixed disk blocks. Yet others use variable sized chunking (like Permabit), delta-differencing, or other enhancements. Coming up with a common language for describing all methods is premature.

On the other hand, asking to be able to backup and restore a deduplicating storage device is perfectly reasonable, and can be done today. Each device can be backed up with expanding the data, but more optimally each device can be backed up with its own internal structures intact. Yes, this means the data can only be restored back to a similar device, but the same could be said about permissions data, or file systems with multiple forks today. What would a "deduplication API even look like?

Of course, the other side of this is how the data is being used, and if backup is even necessary. With Permabit Enterprise Archive we believe backup is a mistake with archive storage. Archive data is accessed infrequently, but needs to be accessed immediately when needed. A multi-petabyte archive will take far too long to backup and restore by conventional tape means, and the multiple generations of tape are costly. Instead, we integrate replication (preserving deduplication) into the product. For less than the cost of managing backups, you can have a full, live copy of your data available at all times.

For smaller primary storage systems backup is critical, but for the bulk of the data out there, we think there are better ways for enterprises to handle their data.

Jered Floyd
CTO, Permabit

Robert Clark

It should be based on an open standard, without a bunch of licensed IP, so that the FOSS folks can re-implement it.

But this discussion seems a day late, and a dollar short. Instead of smarter backup, we need the file system and OS to take care of these sort of things.

The comments to this entry are closed.

Search The Backup Blog

  • Search