Here we are, almost a year and a half after the acquisition of Data Domain by EMC. In this time the Data Domain team has become part of the EMC Backup Recovery Systems division. And that division has been wildly successful in the last 16 months. Now there are a thousand reasons for the success, not least the astounding level of execution in the integration effort and the comparative ease with which the Data Domain and EMC teams joined together. But another key part of the success has been the technology. Simply put, the inline, CPU bound methodology is beginning to demonstrate the real strength of this approach.
In the last seven years the performance of the Data Domain appliance has improved by nearly seventy fold: from a relatively modest 30 MB/s to a rather extraordinary level of over 2000 MB/s.
I like to think of this performance improvement in terms of the two curves that have determined the course of virtually every storage technology for the last twenty years or more. They are the Intel curve and the Seagate curve. The Intel curve is described by Moore's law: performance of CPUs will double every 12 to 18 months. The Seagate curve says that the capacity of a disk drive will double every 12 to 18 months. Unfortunately it has a rather less desirable corollary: performance as measured in IOs per second and throughput will double not every 12 to 18 months but every 5 to 10 years.
Which is why we have monstrous amounts of data but backing it up and recovering it, with either tape or disk without deduplication, is just as hard today as it was 10 years ago. Harder perhaps. Data capacities have increased by 20 to 50 times in that period, but throughputs have not. And as a result if you have stuck with a traditional back technology like tape or VTL you have to keep devoting ever larger amounts of infrastructure with ever larger amounts of capacity to backing up your data.
Your data is on an Intel curve.
Backup infrastructure is on a Seagate curve.
Or rather, it was. Until Data Domain came along. And achieved something quite extraordinary.
And that is increasing both performance and capacity on the Intel curve. Performance increases because fundamentally, the most important (and virtually the only significant) bound on performance in a Data Domain system is the CPU itself. And capacity increases by virtue of the combined impact of the Seagate curve and the multiplier of deduplication. The consequence: both performance and capacity have increased by 70 times in the last 7 years. Which is simply unprecedented.
Now, if you are anything like the vast majority of customers I talk to, if you are anything like the customers that EMC and IDC surveyed recently your data is probably growing at about 62 percent annually.
So now we can see why having an inline CPU-centric approach is so important: because it is the only way to keep up with data growth.
If you choose a backup and recovery technology that is post process, disk centric rather than CPU centric, you are choosing an infrastructure that inherently cannot grow with your data. An infrastructure that can only grow performance by increasing the number of spindles. And that is an infrastructure that will become increasingly expensive to own and operate as the gap between your data (on an Intel curve, remember) and the inherent capabilities of the technology (on a Seagate curve) increases.
Over time the only way to narrow this gap will be to add additional units to the infrastructure. That which requires 2 systems for this year will require 4 systems in 18 months and 8 systems in 3 years.
From a cost of ownership perspective this is exactly the wrong direction to be proceeding in.
The CPU-centric approach offered by Data Domain systems is unique in that it is the only approach that delivers infrastructure that grows in performance and capacity on an Intel curve. And it is therefore unique in the inherent capability of the technology to deliver infrastructure that can grow as fast as data grows. Which should mean that the approximate cost of Data Domain disk infrastructure dedicated to backup and recovery will not grow at a rate greater than the organic growth in your storage infrastructure. It won't require 8 times as much infrastructure just to keep pace with organic data growth, as other architectures might.
As a final note: speaking to the total cost of ownership of a backup environment is difficult: it includes hard and soft costs. Costs for infrastructure, power, cooling, labour, software, maintenance, and so forth. Properly speaking therefore, my conclusion speaks to just the infrastructure component, and Data Domain’s ability as an infrastructure component to grow in capacity and performance at a rate approximately equal to the average rate of growth for data. And while infrastructure is an important component, it is not the only component. Therefore, to generalize my more limited conclusion regarding infrastructure to one including the entire total cost of ownership is not, strictly speaking, a valid generalization.
Comments