From: Craig B. <cba...@us...> - 2007-09-11 17:39:39
|
Rich writes: > If a file has been renamed or moved it looks like rsyncd/BackupPC jobs > will redownload the file even though the file is already in the pool As Les said: yes it will download files again if there isn't a matching path name in a previous backup. As it is downloaded it will be matched against the pool, so it won't be written to disk. So while there is a tranfer overhead in this case, there is no storage (or even disk write) penalty. > I'm currently perusing the sources to see if there is a way to work > around that. Has anyone already investigated this? Am I headed down a > dead-end? Yes you are. When rsync transfers the file list it doesn't contain file checksums (unless you specify --checksum, which File::RsyncP doesn't support). Also, the rsync whole-file checksum is different to the BackupPC pool checksum, so it isn't useful for trying to find files in the pool. Craig |
From: Rich R. <ri...@sh...> - 2007-09-11 19:04:32
|
Craig Barratt wrote: >> I'm currently perusing the sources to see if there is a way to work >> around that. Has anyone already investigated this? Am I headed down a >> dead-end? >> > > Yes you are. > > When rsync transfers the file list it doesn't contain file checksums > (unless you specify --checksum, which File::RsyncP doesn't support). > Also, the rsync whole-file checksum is different to the BackupPC > pool checksum, so it isn't useful for trying to find files in the > pool. > > Yeah, I thought about that after I sent the email. The checksum/hash values aren't necessarily same algorithm. It would therefore require an rsync whole file checksum => pool checksum lookup table/cache. Which I've seen references to... http://backuppc.sourceforge.net/faq/BackupPC.html#rsync_checksum_caching But that caching must be only for the life of that backup job? One could store the rsync checksum in the pool file, but you still need to generate a quick lookup table. Or an alternate file hierarchy using rsync checksums (yuck!). Maybe a berkeley DB using tied hashes? But you'd want a way to remove trashed pool items and you'd need to handle write contention, because I'd assume you'd want a shared table. Even if all that is possible, is it possible to interface into the RsyncP module a way to say, "Oh, hey, I have a file that matches that whole file checksum right here"? |