Do you feel nostalgia for the way backup used to be?
I know I do. Although this is very much a double edged sword.
I feel nostalgia for the simplicity that I thought backup should have, when I first started doing backup (almost 20 years ago now). I remember thinking that all you have to do is getting the data from the client to the server. Easy, right? Especially because you have a backup application to mediate the process.
I feel nostalgia for the the architecture. It was simple. Concise. There were backup clients, and backup servers. And tape drives, and tape. But there weren’t a lot of three-tier applications with media servers, or SAN media servers, or LAN free clients or dedicated storage nodes.
I feel nostalgia for the the simplicity of the protocols. If you wanted a tape drive, you chose SCSI. Because that was your only choice.
Of course it didn’t take me too long to figure out that backup was far more complex than I initially understood.
Complex because it touched every aspect of our IT environment. From the client, with its associated OS and application, through the network with its myriad intricacies, to a server, running a database, a scheduler, policy engines, and massive storage. Agents were often required to get database backups that were useful (anybody remember paying hundreds of thousands of dollars for SQL Backtrack licenses because you really didn’t have any other good options for Oracle backup?).
Complex because the architectures and the software was complex. Complex because backup was operationally intensive. Complex because of the relentless pace of change and data growth in our IT infrastructures.
So while I feel nostalgia, I think it is the nostalgia of innocence lost. Nostalgia for the way things I naively thought should be. Not the way they really were. Really are.
In the last 20 years, backup complexity has nearly overwhelmed us. The number of applications that require live or hot backup has multiplied significantly both in number and scope. Sharepoint and Exchange. DB2 and Oracle. SAP and SQL. The architectural and administrative options most backup applications offer has grown dramatically. There are more protocol choices than ever before: CIFS, NFS, FC, OST, NDMP, and BOOST (and even, still, SCSI). Modern backup applications support a huge variety of operating systems, application clients, specialized application agents, and array integration tools.
Time (and perhaps a grain of hard won wisdom) has shown that complexity is growing. And I don’t have any reasonable expectation that complexity is going to diminish. All I have is my nostalgia.
With a few notable exceptions that is.
Data Domain systems are one such exception: they are extraordinarily simple and elegant both in design, and in terms of operational utility. (And, as an aside, I believe that this is one of the big reasons for their extraordinary success and acceptance with backup administrators and operators.)
So in the absence of a return to innocence or some revolution in backup, we are only left with a way to manage with the complexity. To cope with its operational impact. And to minimize the effect it has on our organization.
And to do this, it is almost mandatory to have a tool like Data Protection Advisor (DPA).
Because DPA may not be able to take the complexity out of backup, but it can take some of the complexity out of understanding and managing a complex backup environment. It is not a time machine that will take you back to the relative simplicty of backup as it was 20 years ago. But it can help drastically reduce the operational complexity of your current backup environment.
Now I recently visited a client who had some very direct, first hand experience of this. We had just finished a proof of concept of DPA in their VMware environment. And it was a big VMware environment: some thousands of VMs for test and dev alone (production systems are slated to find their way into a virtual environment in the coming year). Big enough that the client didn’t know what they didn’t know. And big enough they had no idea how big they had got, and how much or little of it they were backing up. They assumed they were doing a pretty good job. But they didn’t know for sure.
So we put in DPA. And we were pretty shocked. They had told us to expect about 1200 VMs. And they thought they were backing up about 800 of them, with a very high degree of success. In fact, we found about 2500. And the success rate was less than 25%.
(Another aside: DPA does not count situations in which there are multiple failures to get a backup in one window, followed by a success, as anything other than a success. So when you have a 25% success rate on a 1000 systems, that means you only got a backup from 250 systems. Not that you got all 1000 after a few retries.)
Without DPA, they had no ability to measure this, or report or analyze it. The backup application only told them what the backup application knew. But DPA ties into vSphere and reports can draw data about the private cloud. It knows how many systems you have, and how many are protected. And how many are not.
If you are running Avamar in your VMware cloud, it can report per Avamar domain (frequently different domains are used in for different tenants in multi-tenant environments). The reporting that Avamar has for intra-domain analysis can now be extended across domains, and across Avamar grids. Auditing and change control are also a lot easier to manage: the standard Avamar change report is extended to include the user that made the change. So when you have 1300 VMs that are not protection, at least you have an audit trail to determine who excluded them from the backup process. Of course, the next question should be why they are excluded!
A different client locally is struggling with a different problem: they have half a dozen Data Domain systems. They have deployed them for a variety of uses within their backup environment—Exchange, application servers, and database servers all have their data backed up to the pool of Data Domain storage. But they are out of capacity. So it is time to add more. Who is going to pay for it though? The Exchange team? Or the database team? With deduplicated storage, this is not a trivial problem to understand. How much of your deduplicated capacity is being used for a give application set? Do you know? The client has elected to use DPA to measure consumption of storage. So when it is time to acquire more, the biggest consumers of capacity can also make the biggest contribution to the acquisition.
As it turns out, this is actually a pretty critical piece of information to understand. In larger organizations, which leverage a shared infrastructure like backup, different groups consume different amounts of that shared resource. Who pays for what? What is fair? How do you ensure that politics takes a back seat to facts when it comes time to pay for additional resources? DPA offers a solution. How much is consumed, and by whom, is very easily understand. Either for Data Domain or Avamar systems.
Is disk contention causing bottlenecks in backup? Do you know? Your backup application can’t understand this, but DPA can.
Are link drops causing missed backups?
Has the proliferation of VMs meant that entire chunks of the infrastructure are not protected?
Has somebody just added a mass of data that has cause a spike in your deduplicated disk consumption?
DPA permits me to be just a little nostalgic, without begin naïve or putting my head in the sand. And it means that we can solve a few of our backup problems quickly, efficiently, and elegantly. And while it may mean, sometimes, that we have more problems than we thought we had, fixing a problem that we understand is always easier than fixing one we don’t.
So while I don’t have a time machine to take me back to a time when backups were simple, DPA can help wind the clock back just a little, and take some of the complexity out of the job.