A while back, after reading this post, a reader asked: so how do other backup applications compare to TSM? If TSM runs tape drives at 80% to 90% utilization on average, how busy are tape drives with other applications?
Not only is it a good question, it is also one of the primary differences in practice between TSM and everybody else. (And I say everybody else because, for arguments sake, "everybody" else does a traditional grandfather, father, son backup rotation which involves full backups on weekends, and incremental backups throughout the week. Except, as I never tire of pointing out, for databases and email which are probably backed up full everyday by most people anyway. And yes there are a couple other backup applications which do a progressive incremental like TSM, but their market share is so tiny as to render them insignificant.)
To quickly recap: TSM uses drives pretty consistently throughout the day (and night!) because the server has a number of administrative tasks it needs to get done, in addition to primary backup. These include: migration, reclamation, and copy pool processing, in addition to restores. And to add to that, LAN free clients will make further demands on tape drive time. The result of all this is that for TSM users, their tape drives are frequently utilized 22 or more hours out of the day.
Further, because reclamation is the only one of these processes which is really "optional" it tends to suffer. It doesn't get run often enough, or long enough, and tape cartridge utilization deteriorates.
So what happens with NetBackup and Networker?
Well, typically they run a backup at night. Pretty straight forward, right?
But it is what happens next where things get interesting:
- If the organization really appreciates the value of a good backup practice and infrastructure, then the vaulting or cloning operations will begin, to make off site copies of the data backed up last night. After this, there will be 2 copies of all backup data: one on site, and one off site. This is best practice. It is also expensive, because it means that I need a bunch of additional tape drives to do the work: one to read the first backup, another to write it. Ideally however, I will get this done by 10:00 or 11:00 in the morning, in time for the courier to come by and pick up my tapes to take them to my off site storage facility.
- Not everybody can afford, or justify, the number of tape drives this takes however--simply because if you add up the number of tape drives you need to finish backup by 6:00 am, you will probably need to double or triple that number to do both backup and vaulting by 11:00 am. As a consequence, the next best thing you can do is still make a second copy of the backup data for off site purposes, but take the remainder of the day to finish. Let's say that in this case, you are willing to let the vault operation run until 6:00 pm. The courier will then come by the next business day; meaning that your off site tapes can only provide a Recovery Point Objective (RPO) of 48 hours.
- Unfortunately, there is a whole (huge) group of people that can't justify or afford the number of drives required for the second case. These folks typically back up at night, the majority of the backup will finish early in the morning (hopefully before the courier shows up) and then all the tapes created will go off-site. The big problem this creates however is that for any restore to happen, a tape has to be recalled from off-site. This in turn means that your Recovery Time Objective (RTO) will often be a minimum of 4 hours. Unless you are willing to pay your courier/off site storage facility a premium for faster service.
- Technically, there is a fourth group. They are just like the third, except they don't off site at all. Fortunately, I don't run into organizations that fall into this category very much any more. It is a really bad idea. If you are reading this, and the description fits your organization, have a chat with myself, or any one of a number of other good backup and recovery practitioners, and let us explain to you or your management why it is such a bad idea.
So, going through these use cases, what impacts do we have for tape drive utilization?
- Some tape drives are busy at the start of the backup cycle, and all are busy at the end. But the vast majority sit idle during the day until the next cycle begins.
- In this case, tape drives are busy throughout most of the day (just like TSM). I will offer up the personal observation that the tape drives might be 75% busy in this model, rather than 90% as with TSM, but I don't think I could provide a mountain of evidence to substantiate this.
- The third case is much like the first. Tape drives sit idle for long periods during the day.
- The fourth case is like the first too, although the tape drives may be busy doing backup for a longer duration of time.
So, the conclusion I draw is this: TSM uses tape drives more frequently, and for longer, then most other backup applications. In fact when I talk to folks about the pros and cons of TSM, this is one that I always highlight: TSM may use fewer cartridges (but it is often closer than you think) due to the progressive incremental paradigm; but any savings on media are typically offset by a greater expenditure on tape drives and robotics. As a consequence, the progressive incremental paradigm which was designed to save infrastructure actually results in higher infrastructure utilization and costs for TSM, as compared to other backup applications which utilize a traditional backup rotation.
One parting comment: it is not just tape drive utilization which is higher. TSM also requires bigger robots. This is due to the fact that with TSM, you are required to store all on-site media in the library so that it is available for automatic reclamation. Other backup applications can be managed so that only the backup tapes created in the last month, or two months, or six months are stored in the library. Older tapes can be moved to a shelf.