From: Jeffrey J. K. <bac...@ko...> - 2009-12-29 08:33:56
|
Jeffrey J. Kosowsky wrote at about 11:50:04 -0500 on Tuesday, December 22, 2009: > In my neuroses, I ran a perl script that recursed through the cpool > and checked whether the md5sum of each stored file corresponded to its > location in the pool (note when I say md5sum I mean the special notion > of md5sum defined in BackupPC::Lib.pm) > > 1. Out of a total of 855,584 pool entries, I found a total of 35 errors. > > 2. Interestingly, all 35 of these errors corresponded to 'attrib' files. > > 3. Perhaps even more interestingly, all but two of the attrib files > were at the top level -- i.e., $TopDir/pc/<machine>/<nnn>/attrib > (this represents 33 out of a total of 87 backups) > > 4. None of the attrib files appear corrupted when I examine them using > BackupPC_attribPrint > > So what could possibly be causing the md5sum to be wrong just on a > small subset of my pool files? > > Why are these errors exclusively limited to attrib files of which > almost all are top-level attrib files (even though they constitute a > tiny fraction of total attrib files)? > > - Disk corruption or hardware errors seem unlikely due to the specific > nature of these errors and the fact that the file data itself seems > intact > > Of course, I could easily write a routine to "fix" these errors, but I > just don't understand what is wrong here. I suppose the errors aren't > particularly dangerous in that the only potential issue they could > cause would be some missed opportunities for pool de-duplication of > stored attrib files. But there shouldn't be wrong pool md5sums... > OK. I think I found a way to reproduce this. The md5sum for the root level attrib (i.e., the attrib file at the level of pc/machine/attrib) is wrong if: 1. There are at least 2 shares 2. The attrib entries for each of the shares has changed since the last backup (e.g., if the share directory has it's mtime modified Try the following on a machine with >=2 shares 1. Touch one of the share directories (to change the mtime) 2. Run a backup 3. Run another backup immediately afterwards (or more specifically without changing any of the attrib entries for each of the shares) 4. Look at: diff machine/n/attrib machine/n+1/attrib ==> no diffs ls -i machine/n/attrib machine/n+1/attrib ==> different i-nodes 5. The *2nd* attrib is stored in the correct md5sum cpool entry; the first one is not. To explore this, you can use the following perl script I wrote: In particular, try something like BackupPC_zfile2md5 -p -k "machine/*/attrib" (note the script is really just a nice wrapper around the routine zFile2MD5 which is part of my jLib.pm module that can be found on the wikki) ---------------------------------------------------------------------------- #!/usr/bin/perl #============================================================= -*-perl-*- # # BackupPC_zfile2md5.pl: calculate and optionally verify the BackupPC-style # partial md5sum of any file compressed by BackupPC # # DESCRIPTION # This program allows you to calculate the partial md5sum # corresponding to the cpool path for any file that uses # BackupPC-style compression whether or not that file is actually # stored or linked to the cpool. Optionally, if the file is a cpool # entry or is linked to the cpool, you can add the '-k' flag to # verify whether the corresponding cpool path is consistent with the # actual md5sum of the file. # # Multiple files or directories can be given on the command line, # allowing you to calculate (and optionally verify) the md5sum for # multiple files or multiple trees of files. The script also does # path globbing using standard shell globbing conventions. # # Paths are assumed to be either absolute or relative to the current # directory unless, the options -C, -c, or -p are given in which # case the paths are understood to be a cpool file name (without # path), a path relative to the cpool, or a path relative to the # pc directory, respectively. # AUTHOR # Jeff Kosowsky # # COPYRIGHT # Copyright (C) 2009 Jeff Kosowsky # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # #======================================================================== # # Version 1.0, released Dec 2009 # #======================================================================== use strict; use warnings; use Getopt::Std; use lib "/usr/share/BackupPC/lib"; use BackupPC::FileZIO; use BackupPC::Lib; use BackupPC::jLib; use Cwd 'abs_path'; use File::Find; use File::Glob ':glob'; die("BackupPC::Lib->new failed\n") if ( !(my $bpc = BackupPC::Lib->new("", "", "", 1)) ); #No user check %Conf = $bpc->Conf(); #Global variable defined in jLib.pm (do not use 'my') my %opts; if ( !getopts("Ccpka", \%opts) || @ARGV < 1 || (defined($opts{C}) + defined($opts{c}) + defined($opts{p}) > 1)) { print STDERR <<EOF; usage: $0 [options] path1 [path2] [path3].... Find BackupPC-style md5sum of compressed file Options: -C Entry is a cpool file name (no path) -c Consider path relative to cpool directory -p Consider path relative to pc directory -k Compare to md5sum embedded in file name (for cpool entries) or to the inode number of the corresponding pool file (otherwise) -a Use size from attrib file if available (for backup files) EOF exit(1); } my $useattribsize = $opts{a} ? 0 : -1; my $TopDir = $Conf{TopDir}; my $compress = $Conf{CompressLevel}; my $pool = $compress > 0 ? "cpool" : "pool"; my $md5 = Digest::MD5->new; my @zpathlist; foreach (@ARGV) { if($opts{C}) { @zpathlist = (@zpathlist, bsd_glob($bpc->MD52Path($_, $compress))); } elsif($opts{c}) { @zpathlist = (@zpathlist, bsd_glob($bpc->TopDir() . "cpool/" . $_)); } elsif($opts{p}) { @zpathlist = (@zpathlist, bsd_glob($bpc->TopDir() . "pc/" . $_)); } else { @zpathlist = (@zpathlist, bsd_glob(abs_path($_))); } } die "No valid paths...\n" unless @zpathlist; foreach my $zpath (@zpathlist) { unless(-e $zpath) { warn "'$zpath' is not an existing file or directory path...\n"; next; } $zpath =~ s#/+#/#g; #Remove extra slashes $zpath =~ s#/\.(/|$)#/#g; #Remove extra /. find(\&check_md5, $zpath); } sub check_md5 { return unless -f; my $filename = $File::Find::name; my $digest = zFile2MD5($bpc, $md5, $File::Find::name, $useattribsize); return if ($digest eq "-1"); $filename =~ s#^${TopDir}pc/## if $opts{p}; $filename =~ s#^${TopDir}$pool/## if $opts{c}; $filename =~ s#.*/## if $opts{C}; print "$digest $filename"; if ($opts{k}) { if ($opts{c} || $opts{C}) { $File::Find::name =~ m#(.*/)?([[:xdigit:]]+)(_\d+)?#; $digest eq $2 ? print " MATCH" : print " ERROR"; } else { my $poolpath = $bpc->MD52Path($digest,$compress); (-e $poolpath && (stat($poolpath))[1] == (stat($File::Find::name))[1]) ? print " MATCH" : print " ERROR"; } } print "\n"; } |