From: Craig B. <cba...@us...> - 2005-04-02 18:00:20
|
Marlin Prowell writes: > > 1) add the MD5 sum to the XFerLOG line for each file in a > > numbered backup. Actually, now that I think of it, instead > > of just the MD5 sum, it seems it should be the "modified" MD5 > > sum, that is the MD5 string plus whatever suffix leads to the > > correct file in the pool (for the case of multiple files > > matching the same sum -- that way you know exactly which > > file in the pool you need to link to). > > This is a good idea, but does not quite work. I looked at this code > briefly, and I recall seeing comments about *renaming* pool files with > identical MD5 values so that all the MD5 collision names were kept in > sequential order. If the middle file of 5 collisions is deleted, then > xxx_4 and xxx_5 are renamed so the files were named _1 through _4. > > It means that the pool file names are not permanently assigned. The > pool file name at dump time may not be the pool file name at backup time. You know the code well! The original approach has merit, but all you would store is the log file is the md5 sum and not the _nnn extension. The mirror/archive script would still need to do an inode comparison to make sure it has the right _nnn file, in the rare cases when there are md5 collisions for that file. However, the original approach does have some other flaws. There are cases where files are linked to without knowing their md5 digest (meaning this information cannot easily to written to the XferLOG). The two cases that come to mind are: - when a file that fails to transfer correctly (eg: smb fills a file with 0x0 when it can't read it due to a WinXX lock), that file is removed and the same file in a previous backup is linked to, without knowing the md5 digest. - when XferMethod rsync notices a file is identical (based on rsync's block/file checksum algorithm) the previous file is linked to, again without knowing the md5 digest. Craig |