Firms should never promise anything unless they are certain they can deliver, but you have to question whether some of the data de-duplication vendors can back up their claims.
Sepaton’s DeltaStor is the latest in a growing line of dedupe solutions that claim to shrink large volumes of information held on enterprise servers to a tiny percentage of their original size, before shoehorning them onto far fewer hard disks than would otherwise be required for backup purposes.
DeltaStor’s marketing department reckons you can squeeze 50 petabytes (PB) onto a 1PB virtual tape library (VTL), a compression ratio of 50:1. This is feasible because every organisation has carbon copies of file, block and byte sized data needlessly stored in multiple locations all over the place, which can be greatly reduced using an efficient method of finding it, comparing it and discarding the bits that are not already safely tucked away somewhere else.
Performance and headroom issues aside (the indexes of pointers that dedupe solutions create to let us know where on the network the master copy is stored can be fairly large and difficult to process themselves), dedupe is a sensible and desperately needed approach to the growing burden of compliance and data retention rules.
Few would question the wisdom of being more selective about the stuff being copied to all those expensive disk arrays, but it is important to remember that, because not every data type is the same, the compression rates that are actually achievable in uncharted storage environments vary.
Some files (or bits of files), like those email messages we all send by clicking the “reply to all with history” button, contain lots of repeated information that does not have to be stored on the hard drive of every individual in the company that received it – a single copy of the original is all you need to save in case the next Enron breaks at your company. But could the same level of compression be achieved with a large relational database?
To prove their worth, many dedupe vendors are happy to provide a “try before you buy” sample of their technology, which prospective customers can install and run over time to see the results that can be achieved using their own unique file sets and everyday company transactions.
Without being put to such a test, those compression claims are a bit like saying an unhealthy, middle-aged journalist with a permanent hangover can run 10km in under 40 minutes – which I can, given a downhill course, a strong tailwind and a chasing pit bull. But, unfortunately, ideal conditions do not always prevail.





reader comments