From: Bill M. <wm...@po...> - 2007-10-01 17:38:31
|
In response to Chris Hoogendyk <hoo...@bi...>: > > Christopher Derr wrote: > > Greetings, > > > > We're thinking of using Bacula as our disk-to-disk solution for backing > > user and research data. I'm still reading up on it, but I haven't found > > the answer to the following question. > > > > Called pooling in BackupPC and deduplication by industry, I've been > > trying to find out if Bacula has it. A search of the site for either > > word brings up nothing relevant. Does the current version of Bacula > > have the ability to store backups of the same file as one file with links? > > > > For example: If Bob and Joan both have the exact same 2 MB PDF in their > > home directory, a normal backup would store it twice for a total of 4 > > MB. What deduplication does, is store the file once in a central > > location, and then store links from the individual backups to the file. > > If 100 people have this same file, rather than taking up 200 MB of > > space, it still only takes up 2 MB. Unique, I believe, to disk-to-disk > > backups. > > Nope. > > I'm not aware of any open source backup software that does that. Amanda > doesn't do it either. It's non-trivial and has been discussed on the > Bacula list a couple of times. Not sure what the key word would be to > search for it. BackupPC does it: http://backuppc.sourceforge.net/ But their architecture was designed from the ground up to support it. I expect that what happens is when a file with a duplicate filename is backed up for the first time, a checksum is generated to compare it to files of the same name already in the system. When incrementals are run, if the file is recently modified, the checksums are checked again. I think the first thing that would need to occur for Bacula to do this, is the use of something stronger than MD5. Perhaps SHA256. -- Bill Moran http://www.potentialtech.com |