I am going to revisit the issue of backup, cost, and the total cost of ownership of backup in this post. Tony Pearson and I have been discussing this (hopefully in a fairly constructive fashion) over the last little while, and I see he has responded to some of the points that I made in my previous post on the issue. Tony's response can be found here: "The TCO of TSM for Backup".
Tony continues to maintain (I think) that TSM has a "good" TCO and is efficient relative to other backup applications. Although I don't agree with a lot of what he writes, I do think that there are some pretty good lessons to be learned amidst the information he has posted, so I am going to continue to pursue this a bit in this post.
First, Tony claims that I said "only the cost of new TSM servers should be considered in any comparison." Ummm, jeepers Tony, no I didn't! I would never make any such claim! (What I did say is that TSM is unique in the backup world, in that it scales by adding master servers. Once my TSM master server is at capacity, due to I/O requirements, or database scale, or ability to drive disk or tape, I have to buy another master. That stands in sharp contrast to Networker or Netbackup, where I can just buy a Storage Node or Media Server--which tend to be much smaller systems with much less disk than the master servers. So TSM requires many more master servers than competitive solutions--each of which imposes not just a burden of cost, but a management burden at both the server level and the backup application level. Further, based on my experience, the typical TSM customer will end up with more master servers than they would have combined master and media servers if they were using Netbackup or Networker.)
Servers are but one of many cost considerations when it comes to understanding the TCO of a backup environment. Generally speaking, these costs can be divided into hard and soft costs. Hard costs are costs associated with physical, tangible things (servers, network, disk, tape drives, shipping, etc.). Soft costs are those costs associated with stuff that is less tangible. Soft costs can include labor and administration--though truthfully admin costs straddle the line between hard and soft costs, at least in part because they can be difficult to accurately quantify--but also include things like risk, efficiency, and the benefit of systems uptime.
There is one other really important distinction between hard costs and soft costs. Hard costs are easy to justify and include in a business case. Soft costs are notoriously hard to accurately quantify, hard to justify, and very rarely are included in a business case. Let me give an example of this: a business is moving from a tape based backup system to a disk based backup system. Tape is actually quite unreliable; if it has been shipped and handled, it is not unusual to see a reliability of 98 or 99%. Which, from an IT and systems management perspective, is just awful. (How would you like it if your email system would fail to retrieve one out of every fifty emails you receive?) Disk, on the other hand, is considerably more reliable. It shouldn't be too much of a stretch to say that we can achieve 99.999% availability/reliability from disk backup systems. So we have a gap of roughly 2% in the reliability of the two systems.
Now imagine the restore of a critical business system. If I am restoring from tape, I can imagine the restore is 2% likely to fail. And a failure may cost several million dollars--lets say $5m for the sake of discussion. That cost may come from any one of a number of things: are people sitting around unproductive during the restore process? Am I losing revenue because my retail system is not processing transactions? Do I have to re-enter data? How about potential damages to my reputation or to other business partners if the system is unavailable? When I put it that way, $5m starts to look like a very small number indeed. (Realistic estimates depend on industry type, size of business, revenue, etc., but could rapidly stretch to the tens of millions.) Other similar factors may be considered in some industries too.
Finally, we have to understand the likelihood of a restore being required. Generally, we do this by understanding how many systems we have, and what the chances are of a full recovery being required. That usually gives us a metric like: for every 1000 systems, 3 will experience a catastrophic failure per year, and require recovery from backups.
So we multiply the chances of a catastrophic failure per system by the number of systems, then multiply by the increased reliability of disk over tape, and we have a way of quantifying the decrease in risk we can achieve by moving from tape to disk for backup. (We might also factor in the duration of recoveries, and we can further refine this by modifying the model to include tiers of servers, with different values for the cost of a recovery, length of a recovery, and so on, for the various tiers).
So at the end of the day, I might say that I am reducing my risk by $7.5m over 3 years (stipulating the math comes out at .75 critical recoveries per year at a cost of $5m each) by acquiring a new backup system that uses disk rather than tape. So as long as the new system costs less than $7.5m it has a positive ROI, pays for itself, and should be acquired right away! Right?
Well, maybe not.
Isn't all this legitimate, defensible, and accurate? Absolutely yes. No question.
Will it be considered in a business case to justify the acquisition of a new backup infrastructure? No way. Almost never will a business include these considerations in their business case. Admittedly some do, but it is rare. Anecdotally, I would suggest that less than 5% of the business cases I have worked on have included such soft costs.
Business case justifications live and die on the basis of hard costs. Not soft costs.
So, when Tony goes on to discuss a case of a customer switching to TSM from EMC Networker as an example of cost efficiency, I was a little surprised to see that a large number of soft costs appeared to be included in the justification. (As an aside, I am making a few assumptions about what is going on with this particular business case. Tony simply doesn't provide sufficient data to make a really good critical analysis of it--nor would I expect him to necessarily--so I am just doing the best I can with the data that is given.) Lets look at the numbers Tony provides:
- Reduce Business Risks $6,749,796
Consolidate and Standardize IT Infrastructure $4,975,667 Reduce IT Infrastructure Costs $2,057,107 Improve IT System Availability / Service Levels $1,409,431 Improve IT Staff Efficiency / Productivity $982,919
If we look at these numbers, we can see that the first and second numbers are soft costs, as is the fourth. The fifth is equivocal. Without knowing where the benefit comes from, it is tough to say if it would normally be included in a business case or not. Are the gains from using newer systems? Different systems (i.e. TSM rather than Networker)? BIgger systems (if the infrastructure has not been updated in 5 years, then we would well expect new systems to be bigger)?
Being extraordinarily generous, and admitting this is a hard cost, we can see that a staggering 81% of the costs in the business case appear to be soft costs. Costs, in short, that would never be included in the vast majority of business cases.
Further, if we exclude these soft costs from this business case, we can see that there is simply no reasonable financial justification for moving from Networker to TSM. The benefits over 3 years would be about $3m, and the costs would be $5.76m.
Lets take this further, however, and ask what portion of the infrastructure cost reduction and staff efficiency gains would be achieved by any technology infrastructure upgrade? Absolutely irrespective of the underlying backup application itself? If I moved from LTO-1 to LTO-4 or a modern disk deduplication technology for example, I would fully expect to achieve some pretty considerable savings.
In other words, Tony's conclusion that "IBM Tivoli Storage Manager uses less bandwidth, fewer disk and tape storage resources than EMC Legato. For even a large deployment of this kind, payback period is only NINE MONTHS" is not only unsupported, but actually contradicted by the evidence he himself provides. Without soft costs, TSM appears to be more expensive than the alternative (EMC Networker).
So one of the points that I try to make every time I discuss TCO is that numbers are meaningless unless they are your numbers. How much do you pay for administration? For tapes? Do you include some soft costs? If so, which ones? I think that point is doubly true here. Without some understanding and justification of the numbers, it is very hard to attach any significance or relevance to them.
The next point that is particularly important here is regarding my earlier observation (also quoted by Tony):
Final point: there is actually a really important secondary point here--what is the TCO of your backup infrastructure. In some ways, TSM is one of the most expensive (number of servers and tape drives, for example), relative to other backup applications. However, I think it would be a really interesting exercise to critically examine the TCO of the various backup applications at different scales to evaluate if there is any genuine cost differentiation between them.
Based on the data provided by Tony, all we can say is that we cannot make any reasonable inference or conclusion with respect to the question. It remains unanswered.
However, I do have lots of anecdotal evidence to suggest that Tony's answers are pretty deeply flawed. For example, I have ample evidence that shows Networker's deduplication option can reduce bandwidth consumption by over 95% when compared to a TSM deployment. (Yet Tony suggests there are bandwidth savings when comparing TSM to Networker.) I have further evidence in the form of business cases that deal only with hard costs that Networker tends to have a considerably lower TCO than TSM. And, contrary to Tony's suppositions, these business cases were built for organizations with 5,000 to 50,000 employees (not the rather trivial "10 employees" that Tony attempts to ascribe to the typical Networker installation).
So what would I take away from this it it was my business case? I would want to see hard and soft costs clearly differentiated. I would want to see a return on investment of less than 24 months (perhaps less than 36 in some cases) based solely on hard costs. I would want the business case to be built with numbers that are the actual numbers for my business--not based on industry averages or general assumptions. I would want to understand what benefits accrue as a result of doing a technology refresh, and what benefits result from any proposed change in application. And I would want to see all the calculations, assumptions, and reasoning clearly described and justified--no voodoo economics please.