« Deduplication and Archiving | Main | NetApp's Compliance Isn't Compliant »

July 16, 2008

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Mike Ewell

Enjoyed reading your blog entry on July 16th, 2008, entitled "HP's Whac-A-Mole VTL". First, I would like to agree with you that our D2D Backup Systems may be limited in scalability and capacity - that is, depending on one's perspective. For a large Enterprise or data center, this is certainly true. However, these products are positioned for small and mid-range businesses (SMB), as well as for remote/branch office sites (ROBO), that require improved backup and restore capabilities over traditional backup to tape (disk-to-tape) data protection schemes. Rather than take the approach of many other storage vendors, including EMC, which try to "tweak" higher-end solutions to meet the needs of smaller customers (think square peg, round hole), HP has designed a portfolio of disk-based data protection solutions specifically for customers that have smaller budgets and storage requirements. Our new D2D Backup Systems, which you reference, range from 3TB to 9TB, offer comparable (if not better) performance than other solutions within the same class of disk-based storage products, and are priced starting at $6500 for a complete system (including the deduplication software). Additionally, by using inline, hash-based deduplication (which you fail to mention), we are compatible with a wide range of backup applications that customers already have installed - unlike the EMC Avamar solution which requires customers to "rip and replace" their current backup applications. HP's D2D Backup Systems are easy to install (typically, in less than an hour) and are just as easy to manage, requiring little, if any, need for expensive installation/service/support contracts (ala EMC, IBM, Data Domain). Also, HP's D2D Backup Systems utilize target-based deduplication (again, which you fail to mention) which is far less dependent on client resources and much less likely to impact the availability and performance of client applications - unlike source-based deduplication solutions such as EMC Avamar.

For years, I have seen press releases and heard EMC executives tout their dedication to providing SMB solutions. EMC purchased Dantz Retrospect, partnered with Dell, and created the Insignia product line - all in the name of garnering market share in the fastest growing IT customer segment - the SMB segment. Yet, EMC continues to build Enterprise solutions, then remove a few hard drives, take away a bit software, and then claim it has solutions for the SMB segment. Sorry folks, that's not how it's done. You can’t claim to cater to small businesses when your pricing starts at $20K to $50K for a low-end data protection solution. While EMC may sell some volume in the mid-range segment (everybody and their brother competes there!), I think you have missed the boat on smaller businesses which comprise 80% to 85% of worldwide businesses. Probably time to check your market research...

Lastly, on the topic of product names, as you so thoughtfully pointed out how bad "D2D" was, I think the uninformed IT customer would find EMC's product names quite amusing - Avamar, DL 3D, Centera, Clariion, and Celerra. My bet is that the uninformed IT customer would think that your either selling pharmaceuticals to middle-aged men (if you get my point) or selling props from an episode of Battlestar Galactica.

Thanks and look forward to our future conversations.

Mike Ewell
Product Manager for the HP StorageWorks D2D Backup System product line http://www.communities.hp.com/online/blogs/datastorage/Default.aspx

Scott Waterhouse

Mike;

Appreciate the comments, and thanks for the dialogue.

Some quick thoughts in reference to all this: narrowing the focus to consumer/SMB (with the emphasis on "S") I think there is a interesting development going on: users have a choice between SaaS and software approaches (Mozy, Avamar from EMC, and yes there are others too) and hardware oriented approaches (including the HP product you described, Avamar Virtual Edition and single node products, and so on). I have said it elsewhere on my blog, but one of the things that makes this a really exciting time is that there are all kinds of new choices that weren't previously. Some are quantitatively different, some qualitatively different. And lacking a crystal ball or the powers of Kreskin, none of know for sure exactly which (if any) of the current offerings will be the "winner". But there are a lot of new choices, and that is great for everybody, most of all the customers.

For now, my belief is there are some customers that will prefer a subscription or software only apporach (which can be as little as $5/month--much less than $6500 to own something); and there will be another group that prefers to own infrastructure, and that prefers a tangible asset to own and control.

But I do think to call a winner yet is premature.

As for product names, well, we are all just a little guilty aren't we? :)

Finally, as for EMC strategy and small business appeal... I am not sure that I am really the right person to address this, as I am somewhat focused on the enterprise. However, let me offer some perspective: EMC is engaged in a transformational effort. We are adding new capabilities and new products. That makes it exciting for us internally and exciting for customers. As much as I don't speak for (and can't!) our corporate strategists, I think that saying that we don't have an SMB offering is inaccurate. And while it is inaccurate from a product point of view, it is also missing the point of our strategic efforts to bring the functionality that would be traditionally offered in a "product" (with a box) as a service. And these offerings are available now and experiencing tremendous growth today (in areas like backup, security, etc.) I don't think you need to ship a brown cardboard box to meet a customer requirement!

Steve Johnson

This is a response to the EMC blog regarding HP’s recently announced VLS Accelerated Deduplication. It doesn’t take a wizard to see that data deduplication has not made much of a penetration inside the enterprise data center environment. And with good reason, most enterprise customers want fast backup and restore and a truly scalable solution.

Most of the hash based solutions, including EMC’s recently announced DL3D VTL’s use an inline deduplication scheme where the data is deduplicated before landing it to disk. This has a tendency to slow down the backup window - which is completely unacceptable to any enterprise customer. Since EMC uses the Quantum deduplication engine they can either select inline or post process however they suffer from the same malady as the rest – slow restores, especially the most recent ones – the ones that customers restore most of the time.

This is one of many areas where HP’s VLS systems stand out. Unlike the EMC system where the most recent deduplicated backups are the most scrambled. The HP VLS system keeps the most recent backups in tact for lightning fast bulk restores of server environments. The difference between the two can be as much as 70% slower for the EMC restores.

Now let’s talk about scalability. Since EMC uses the Quantum data deduplication engine we all know that Quantum has had some real problems scaling their DXi series. Who else would announce a product a year before they actually could ship it? The problem with hash-based solutions is that you have a global comparison to make each time you read in new data. Your data index and scope of comparison continue to grow. Scaling becomes increasingly difficult.

HP uses object level differencing because it adds intelligence to the comparison process. You can strip out all the data files and data types so that your maximum comparison never exceeds a singles day’s backup. This alone allows you virtually limitless scalability. With hash based solutions the only way you scale is by adding more devices – now you spend all of your time moving backup jobs across these devices to get your best performance.

With the VLS9000 we can scale this multiple node grid solution over a petabyte (with 2:1 hardware compression) and yet present it as a single virtual library (i.e. one backup target). The system always provides the best possible performance by automatically load balancing the performance across all available nodes - without you getting involved.

So what about your comment on ISV qualification. I’m not sure how you do your math but your 1 million number is way off (i.e. “the number of combinations approaches a million pretty rapidly”). Have you heard of the 80:20 rule? It turns out in the enterprise that 4-5 of the major backup applications make up greater than 80% of the market. The number of data bases and data types is also limited to a few core. And guess what – the tape format for these backup applications rarely changes and if you support a data type with one backup application and OS – most of the work is done. Within six months this will be a non-issue whereas your scalability issues will never go away.

One other big difference that HP has with the VLS is that it was designed from scratch with data deduplication in mind – it was built into the VLS file system. Unlike your cobbled-together DL3D4100 data deduplication platform where you took the Falconstor-based DL4000 and bolted on a Quantum dedupe solution onto it. This is like what the US Army in Iraq called hillbilly armour – bolting on armour at the last minute. Bottom line … it don’t work too well.

Steve Johnson
Product Manager for the HP StorageWorks VLS Advanced Features http://www.communities.hp.com/online/blogs/datastorage/Default.aspx

Scott Waterhouse

Steve;

Thanks for the response, but there are quite a few inaccuracies in your description of the EDL3D and DL4000 3D.

Before getting to that however, note that I revised the math in a subsequent post and apologized for the exaggeration. The actual number is 720. And that is with the 80/20 rule in play (oddly, I have heard of it... I guess the rock we live under here at EMC isn't quite *that* big).

So, the inaccuracies:

1) We do not use an exclusively in-line deduplication scheme. We give customers choice: in-line, out-of-band, or "never". As far as I know, we are the only ones (Quantum excepted, of course) to offer this choice, and to do so in an easy to implement, trivially easy to administrate manner.

2) With respect to restore speeds, you are correct to observe that VLS based on Sepaton code keeps the most recent backup intact. Key here is that I don't think "backup" can be pluralized. If it is 2 days old, it will be deduplicated. Correct? And therefore it will be subject to a restore penalty--unless of course it couldn't be deduplicated because it was one of the extremely limited data types supported. Or because its was entirely new data. Either way, not much point in wasting money or CPU cycles trying to deduplicate it, no?

3) The difference between the two can be as much as 70% Well there is a marketing statement if ever I heard one! Meaning that under ideal circumstances for HP (yesterdays' backup) and what you presume to be worst case circumstances for EMC (backup from one year ago) the restore speed of the HP system will be 70% better? Perahps. And I am only willing to concede a qualified "perhaps" because neither HP or Sepaton have been able to articulate backup or restore performance in any insightful, meaningful, or usefully descriptive way. Maybe once (if) you do, we can resume the performance discussion.

4) With respect to your claim regarding scalability: I have no intent nor need to apologize for Quantum's release strategy. However, I can say that the problem you have observed (in theory) with hash based solutions has not manifested itself, in practice, with our solution. We support 148 TB of physical, useable (not raw) capacity per deduplication engine. How many does Sepaton or Quantum support? Well, it is tough to say because you don't want to admit that the deduplication component does not want to scale as well as the VTL component: only 4 engines, not 16. So my guess is that it is not 500 GB of disk (1 PB with compression). Or is it? Perhaps HP can break the Sepaton cone of silence and provide some detail here...

5) Finally, nice line: hillbilly armour. Cute. But if you want to test our 4000 3D I think you will find they work just fine. But you will have to get at the back of a long line of customers to get one. And cough up a PO#. :) So far we have been delighted with the reliaiblity, ease of use, functionality, and market acceptance of the platform. Trying to disparage it by calling it names is just kind of silly. Especially when the DL4000 family is *by far* the dominant open systems virtual tape platform by market share.

The comments to this entry are closed.

Search The Backup Blog

  • Search

    WWW
    thebackupblog