An intro to data deduplication

March 23, 2010
By George

deltaData deduplication is a data backup process that eliminates duplicated data. At first thought, one may think that the word deduplication means the negation of duplication! Although, this is what happens in data deduplication that connotation is not the right one! Actually, the word deduplication means the division of that which is one whole into two or more pieces. In fact, the data deduplication mechanism divides data into blocks or chunks of bits in order to eliminate the redundant pieces within data.

A typical example which many authors refer to when explaining data deduplication is the email attachment scenario. Although, as a concept this example will help the readers understand better data deduplication, I would like to point out that data deduplication mechanisms do not generally operate at file level. For the sake of completeness, the example goes that the same email attachment may be present in many email messages; however, only one copy of it is stored during the backup or archive process. A simple task that can be done with a less processing hungry mechanism than data deduplication.

Data deduplication looks within files to find unique blocks of data and then computes a hash algorithm such as MD5 or SHA-1 to generate a unique number for each block. Each unique number is then stored in an index. When files are updated, only the changed data is saved. That is, only the changed blocks or bytes are saved. In practice, data deduplication is combined with other mechanisms such as, delta differencing as to achieve better and safer use of the storage space.

Some benefits of data deduplication:

  • Data deduplication offers lower storage space requirements and hence, saves you money on hardware (disks & tape ) and/or remote storage location costs.
  • May provide better recovery times in Disaster Recovery situations if well planned.
  • Most importantly, data deduplication reduces the data that must be sent across the Internet for remote or online backups.

Tags: , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Search IT Info Mag