I have been getting lots and lots of questions about Avamar lately. And there seems to be lots of confusion about what Avamar is, what it does, where it fits, and so on. And because I have to wait 5 days to talk about all the really cool stuff that I have been alluding too with my countdown, I thought a quick tour through the world of Avamar might be fun.
So buckle your seatbelts. It is going to be weird at times. You might wonder if you should have chosen the blue pill instead of the red pill. But it will be worth it.
With that said, the first question I usually get is: "What is Avamar? Is it a backup application? Or is it hardware? We know it does deduplication, but what is it?"
And the answer is: both. And more.
Avamar is a backup application. It has a client, and a server. Client send backup data to servers.
Avamar is hardware (sort of) in the sense that EMC sells a product called a Data Store. A Data Store is an appliance that runs the server component of the Avamar application. It offers all the usual benefits that appliances have over the build-it-yourself approach. You can read the marketing, I don't need to repeat it here.
But Avamar is more, because it is also now a component of EMC Networker. Meaning that Networker has now implemented Avamar functionality, and that a given client can do "traditional" backups, Avamar backups, or both. More on that later.
So even though Avamar can be one of three things, lets focus in on the application itself for this post. We will set aside the Data Store implementation, and the Networker integration, for another time.
First up: the client. The client is, in a lot of ways, the brains of the operation. The client does all the deduplication. But, as is often the way with things, the how is the interesting part.
- The client does deduplication on a segment basis. Segments are, well, bigger than blocks and smaller than files. They are also variable in length. And this is where some of the "secret sauce" of Avamar is. Variable in length because Avamar is smart enough to know where within a data structure the segment is, and what type of data it is, and size it accordingly. For example, the segment from the start of a PowerPoint will be different in size from a segment in the middle of a .pdf or the start of an Excel spreadsheet.
- Essentially therefore the client looks for globally unique segments. However it is a little like a hippy: it thinks globally and acts locally. (And Jed and Tony just had a minor cardiac event with that analogy, I am sure...) Here is what happens, simplified:
- The client looks for changes in the file system, and then changes in files, and then new segements. (There is some very cool technology here by the way--it is one of the ways in which Avamar can back up big file systems 90% faster, or more, than any competitive backup application.)
- Once it identifies a new segment, it hashes it (twice) to product a unique fingerprint.
- Once it has collected all the new fingerprints, it queries the server, and asks if the Avamar server already has any of the fingerprints. Quite often, it does. In fact, Avamar can do as much as 50% better at identifying commonality within data on different client systems than other, target deduplication solutions.
- Finally, it gets a message back from the server identifying all the segments that are already stored. The client can then do the math, subtract out those segments already on the server, and transmit only truly unique segments. And this is HUGE. The practical implication here is that we only transmit a tiny fraction of the data that a normal backup would generate. When we say that Avamar reduces network requirements by as much as 99.9%, this is why. The Avamar client typically uses less than 1% of the network (LAN and/or WAN) capacity that a normal backup application requires.
Finally, for the sake of being complete, two more things we should note:
- Each backup is called a "snap-up". For Avamar, a snap-up is a backup, and vice versa.
- Each snap-up is a full backup. Because all references to segments are pointers anyway (Avamar never stores a full "file" on the server), every day Avamar has a complete pointer reference to everything on the client. So from an Avamar perspective, we only do full backups. This is conceptually similar, sort of, to TSM. But of course it is done at the segment level, rather than the file level. I raise the issue primarily because there has been some comparison between the two methodologies in the media. However, TSM properly only does progressive incrementals (for unstructured data), and Avamar does full snap-ups every day.
So that is the client, at a high level. Next up, the server.