After reading Chuck's post, explaining why he has a bad case of deja vu, I would like to add one more item to what happens during the initial stage of the "infatuation curve". Chuck writes that:
A new technology becomes available. It's seductive in its initial appeal. Vendors rush to claim their stake in the new territory. A variety of approaches become available, accompanied by a heated industry debate.
But I think one more really important thing characterizes adoption: an absence of critical analysis and widespread understanding of best practices. In other words, the general message around the technology is that it is magic. Who cares how it works, or what the "gotchas" are, it works, and that is good enough, so lets move on and just implement it.
Deduplication certainly fits this model.
There hasn't been a lot of critical analysis that I have seen about what makes it work better, or worse, and what you should do, or shouldn't do, once you have it.
Generally, there are a bunch of things that make deduplication ratios degrade:
- High change rates in data.
- Very long term retention of data (which tends to lead to #1 above).
- Data that can't be compressed by normal means at all--pictures, scientific data, etc.
- Encryption or compression on the client side, which randomizes the bits before they can be deduplicated.
Today, I want to talk about something else, something that should properly be #5 on the list above: mutliplexing or interleaving backup data by the backup application.