From: John P. <jp...@cl...> - 2007-03-28 20:52:23
|
Evren Yurtesen wrote: > BackupPC Manual mentions: > > ---------------------------------------------------- > Each file is examined by generating block checksums (default 2K blocks) on the receiving side (that's the BackupPC side), sending those checksums to the client, where the remote rsync matches those checksums with the corresponding file. The matching blocks and new data is sent back, allowing the client file to be reassembled. A checksum for the entire file is sent to as an extra check the the reconstructed file is correct. > > This results in significant disk IO and computation for BackupPC: every file in a full backup, or any file with non-matching attributes in an incremental backup, needs to be uncompressed, block checksums computed and sent. Then the receiving side reassembles the file and has to verify the whole-file checksum. Even if the file is identical, prior to 2.1.0, BackupPC had to read and uncompress the file twice, once to compute the block checksums and later to verify the whole-file checksum. > ---------------------------------------------------- > > Why is it actually necessary to do this checksum checking? > If you turn on checksum caching (see the manual) it doesn't read every file every time on the server (just a random sample to ensure that nothing nasty has happened to the pool). It also doesn't read every file client side for incremental just for full backups. > Wouldnt it be enough to find files with non-matching attributes and back them up? > That's what it does for incremental backups - it says so in the text you quoted. > I think that in most cases if at least modification time is different then the file > should be backed up anyway, no? at least there can be situations where the > lst modification time of a file is more important than it's contents even (I dont > see how but it is a possibility) > > The checksum in rsync is more about reducing data on the wire than it is about deciding what gets copied. If the attributes have changed it will get backed up but only the data that has changed will actually get sent across the wire. John |