From: dan <dan...@gm...> - 2008-12-09 15:53:43
|
just playing devils advocate here as this conversation has already chosen its direction. would it be reasonable to compare two filesystems based on their journal? I am assuming that essentially all backuppc installations are on a journaled filesystem or could be upgraded to one. wouldnt replaying that journal to a tar file, sending the tar file to the remote host, then restoring the journal be pretty efficient? I have no experience with this, just wanted to throw another concept out there. On Mon, Dec 8, 2008 at 10:49 PM, Jeffrey J. Kosowsky <bac...@ko...>wrote: > Holger Parplies wrote at about 04:10:17 +0100 on Tuesday, December 9, 2008: > > Hi, > > > > Jeffrey J. Kosowsky wrote on 2008-12-08 09:37:16 -0500 [Re: > [BackupPC-users] Advice on creating duplicate backup server]: > > > > > > It just hit me that given the known architecture of the pool and cpool > > > directories shouldn't it be possible to come up with a scheme that > > > works better than either rsync (which can choke on too many hard > > > links) and 'dd' (which has no notion of incremental and requires you > > > to resize the filesystem etc.). > > > > yes, that hit someone on the list several years ago (I don't remember > the > > name, sorry). I implemented the idea he sketched (well, more or less, > there's > > some work left to make it really useful). > > > > > My thought is as follows: > > > 1. First, recurse through the pc directory to create a list of > > > files/paths and the corresponding pool links. > > > Note that finding the pool links can be done in one of several > > > ways: > > > - Method 1: Create a sorted list of pool files (which should be > > > significantly shorter than the list of all files due to the > > > nature of pooling and therefore require less memory than rsyn) > > > and then look up the links. > > > > Wrong. You need one entry per inode that points to an arbitrary path > (the > > first one you copy). Every file(*) is in the pool, meaning a list of all > pool > > files is exactly what you need. A different way to look at it: every > file with > > a link count > 1 is a pooled file, and it's these files that cause > rsync&co > > problems, not single link files. (Well, yes, rsync pre-3 needed a > complete > > list of all files.) > OK. I had assumed (wrongly) that rsync needed to keep track of each > file that is hard-linked, not just one copy. > Still, there are some savings by knowing that you can find your one > copy in the pool and you don't have to look at all through the pc tree. > > > > (*) Files that are not in the pool: > > 1.) 0-byte files. They take up no file system blocks, so pooling > them > > saves only inodes. Not pooling them makes things simpler. > > 2.) log files (they get appended to; that would make pooling > somewhat > > difficult; besides, what chance is there of a pool hit?), > > backups files (including backups.old) > > attrib files are pooled, contrary to popular belief, and that makes > > sense, because they are often identical with the same attrib file > from > > the previous backup(s). > Yes. I am aware of this from the routines I wrote to check/fix pool > consistency and missing links to the pool > > > > > > The algorithm I implemented is somewhat similar: > > 1.) Walk pool/, cpool/ and pc/, printing information on the files and > > directories to a file (which will be quite large; by default I put > it > > on the destination pool FS, because there should be large amounts of > > space there). > > 2.) Sort the file with the 'sort' command. The lines in the file are > > designed such that they will be sorted into a meaningful order: > > - directories first, so I can create them and subsequently not worry > > about whether the place I want to copy/link a file to already > exists > > or not > > - files next, sorted by inode number, with the (c)pool file > preceeding its > > pc/ links > > The consequence is that I get all references to one inode on > adjacent > > lines. The first time, I copy the file. For the repetitions, I > link to > > the first copy. All I need to keep in memory is something like one > line > > from the file list, one "previous inode number", one "file name of > > previous inode". > > 'sort' handles huge files quite nicely, but it seems to create large > > (amounts of) files under /tmp, possibly under $TMPDIR if you set > that (not > > sure). You need to make sure you've got the space, but if you're > copying a > > multi-GB/TB pool, you probably have. My guess is that the necessary > amount > > of space roughly equals the size of the file I'm sorting. > > 3.) Walk the sorted file, line by line, creating directories and copying > files > > (with File::Copy::cp, but I plan to change that to PoolWrite, so I > can add > > (part of) one pool to an existing second pool, or something that > > communicates over TCP/IP, so I can copy to a different machine) and > > linking files (with Perl function link()). > > In theory, a pool could also be compressed or uncompressed on the > fly > > (uncompressed for copying to zfs, for instance). > > Yes... I was thinking very similarly though you have clearly fleshed > it out and coded it in more detail. I would love to see the code when > it's stable and offer feedback or testing if that is helpful. > > > > > > Once again, because people seem to be determined to miss the point: it's > *not* > > processing by sorted inode numbers in order to save disk seeks that is > the > > point, it's the fact that the 'link' system call takes two paths > > > > link $source_path, $dest_path; # to use Perl notation > > > > while the 'stat' system call gives you only an inode number. To link a > > filename to a previously copied inode, you need to know the name you > copied it > > to. A general purpose tool can't know when it will need the information, > so it > > needs to keep information on all inodes with link count > 1 it has > encountered. > > You can keep a mapping of inode_number->file_name in memory for a few > thousand > > files, but not for hundreds of millions. By sorting the list by inode > number, > > I can be sure that I'll never need the info for one inode again once > I've > > reached the next inode, so I only have to keep info for one file in > memory, > > regardless of how many I'm copying. The difficult part is now the > 'sort', but, > > again, the 'sort' command is good at handling huge files - probably > without > > limit to the file size. > > Exactly. > > > > > > So, what's the problem? Well, I used it once, because I needed to copy a > pool. > > It seemed to Work For Me (tm), but I'm unsure how to verify the result, > aside > > from randomly looking at a few files and hoping the 99.99999% I didn't > look at > > are ok too. > > Why not use rsync to test it? (with the -n flag for dry-run, -v to see > changes it would have made, and -delete to check both directions) > And if your pool is too big, just test it on a smaller example since > unlike rsync, the method you and I are discussing should be pretty > scalable so if it works on a reasonably complex 10GB pool, it should > also work on a TB one... (hopefully) > > > It's far from complete. Its usefulness is limited, as long as I > > can't copy to a remote machine. It only handles the pool/, cpool/ and > pc/ > > directories, the rest needs to be copied by hand. There is debug output > for > > cases I hope to not encounter but which may be present in other people's > > pools. I think I'm still missing a chown() or two. > > What more is there beyond pool/cpool/pc other than Trash which you > presumably don't need to synch and .ssh which is small and should > rarely change? > > > > Let's see if I can find the figures. My pool was 103 GB, 10 million > directory > > entries pointing to 4 million inodes. Copy from local disk to ISCSI > target > > over shared 100 MBit network in 10:45 hours. Of this time, 15 minutes > were > > spent examining cpool (pool was empty), 73 minutes examining pc/, 165 > seconds > > sorting file list (1.1GB) and 9:14 hours copying/linking. > > rsync might have worked for this pool, but I didn't test. I would be > very > > curious how this scales to a 3TB pool though ;-). > > > > > > Question (to a tar expert named Craig or otherwise): > > Is it possible to create a tar stream with this structure (i.e. lots of > > directories, then file 1/2/3/123whatever with content, then several > links in > > different directories under pc/ to this file, then next pool file and so > on), > > or does a tar need to be sorted by directories? > > > > If it *is* possible, creating a tar stream instead of copying/linking > would > > not be difficult, and then you could run > > > > BackupPC_copyPool ... | ssh hostname tar xpf - > > > > (or via netcat, or even store the result on tape). Even merging into an > > existing pool could be split into a BackupPC_mergePool script which > takes a > > tar stream and does whatever is necessary. > > > > > - Method 2: Calculate the md5sum file path of the file to determine > > > out where it is in the pool. Where necessary, determine > among > > > chain duplicates > > > > That's basically what BackupPC_tarPCCopy does. > > So why not use this? Or is it too slow since you need to create the > path md5sum? > > > > > - Method 3: Not possible yet but would be possible if the md5sum > > > file paths were appended to compressed backups. This would add > very > > > little to the storage but it would allow you to very easily > > > determine the right link. If so then you could just read the link > > > path from the file. > > > > I believe this would speed up BackupPC_tarPCCopy by many orders of > magnitude. > > > > > 2. Then rsync *just* the pool -- this should be no problem since by > > > definition there are no hard links within the pool itself > > > > > > 3. Finally, run through the list generated in #1 to create the new pc > > > directory by creating the necessary links (and for files with no > > > hard links, just copy/rsync them) > > > > See BackupPC_tarPCCopy. > > > > > The above could also be easily adapted to allow for "incremental" > syncing. > > > Specifically, in #1, you would use rsync to just generate a list of > > > *changed* files in the pc directory. In #2, you would continue to use > > > rsync to just sync *changed* pool entries. In #3 you would only act on > > > the shortened incremental sync list generated in #1. > > > > While you can limit your pc/ directory traversal to only a subset of all > > backups of a host (or all hosts, if you give a start date, for example), > I > > don't quite see how syncing the pool should work. Remember that pool > files > > with hash collisions may be renumbered. Is this supposed to be limited > to > > re-creating an identical pool? Even then, renaming a pool file does not > affect > > the pc/ links to it. Overwriting it with different content does. You > would > > need to re-establish the correct links for existing backups too, or > figure out > > how the source pool was changed and replicate the changes to the > destination > > pool (rm foo_2 foo_3; mv foo_4 foo_2). This can be done, but not with > rsync, > > as far as I can tell. > > Ahhhh... I see your point. Rsync won't realize that links have been > renumbered but will instead treat them as new files. > > Maybe do the following: > - Use rsync with dry-run flag to identify pool files that have changed > (based on size, name, time). Then sort the list by md5sum path > Then work through the list as follows: > - If new md5sum path, then just copy over to the backup pool > - If existing md5sum path then we need to deal with the following > subcases: > - New hash collision > - Renumbering > - New/changed rsync checksum or compression flag (but same > file content) > - Deleted pool entry > To separate the cases, I would calculate the full file md5sum of > the content. Then renumber, then copy over new additions and > delete deletions. Then check for changes to compression flags & > rsync checksums. > > The key is that this can all be decoupled from the pc tree since the > link itself is unchanged for any entry in the pc tree that stays. > > After this incremental sync of the pool, then you proceed to > incrementally backup the pc tree. In general, this is nicely > structured since a backup is either added in total or deleted in total > (though this gets more complicated and would require a file-by-file > examination of changes (or rsync) if you had used a routine like my > BackupPC_deleteFile to modify or delete individual files from past > backups) > > - If a backup has been deleted, then simply 'rm -rf' it (along with > its logs) from the pc tree > - If a backup has been added, then use the methods described above to > link in the new files to the already synchronized pool. > > Note that a potentially better idea would require structuring the pool > naming convention to get rid of then need to renumber chains. Instead > of giving hash collisions an ordinal subscript, jsut give each pool > entry a name composed of the concatenation of the path md5sum plus > another random 32 digit number. Then you would never have to renumber > since their would be essentially a zero percent chance of collision. > In that case you would be able to do a simple rsync on the pool. > If > > > > > The more I think about it, the more I LIKE the idea of appending the > > > md5sums file paths to compressed pool files (Method #3) > > > > Yes, but the question is how often this information is needed. We're > going to > > a lot of trouble to *eliminate* redundancy. Adding redundancy for a case > every > > 100th user is going to encounter once in his life may not be warranted. > Then > > again, it's a fixed amount per file and probably not enough to worry > about ... > It could be made optional whether or not to append this additional > information (similar to how rsync checksums are optional). > > More generally, since the latest rsync protocol uses md5sums anyway > (instead of md4), why not just bite the bullet and start using the > full md5sum as the pool file name since we will be needing to > calculate it anyway for protocol 30 rsync. This will get rid of hash > collisions (for all intents and purposes) and also give us a built in > checksum to use against file corruption (or changes). If the md5sum is > then stored similar to how the rsync file digests are now stored, we > won't need to add any storage and you could then use my Method #3. > > Hash collisions are really a pita since they introduce a bit of > awkwardness. Getting rid of them would simplify the BackupPC_link > algorithm and would eliminate the need for pool renumbering. > > Along these lines, I really wish that BackupPC didn't wait for the > second backup of a file to store the rsync digest information since it > would be nice to have the checksums always there in case you ever need > to verify file integrity. > > > ------------------------------------------------------------------------------ > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. > The future of the web can't happen without you. Join us at MIX09 to help > pave the way to the Next Web now. Learn more and register at > > http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ > _______________________________________________ > BackupPC-users mailing list > Bac...@li... > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users > Wiki: http://backuppc.wiki.sourceforge.net > Project: http://backuppc.sourceforge.net/ > |