From: John G. <jgo...@co...> - 2011-02-25 03:35:09
|
As you have have seen, I have recently observed that rsync performance in backuppc is surprisingly bad in two cases: one involving very large files, and the other involving incrementals of level > 1. I have been avoiding tar for general-purpose use due to the warning at [1], which comments that files extracted from archives won't be backed up since they will have timestamps in the past. From reading the GNU tar manual, however, it appears that --newer, which is used by BackupPC, inspects mtime and ctime and shouldn't have that problem. (It may still have the problem of not noticing deletions or renames until the next full backup, but this isn't a data loss scenario, and is fairly common among backup systems so I am used to dealing with it.) I, therefore, did some benchmarking of BackupPC data with rsync and tar, and also did some testing to validate whether the limitations listed were valid. I'll summarize what I found here. TEST SETUP ---------- The tests were run with compression level 3 on a Core 2 Duo 6420 running 64-bit Debian. I backed up /usr and /etc/backuppc. /usr contained 21,356 directories and 205,342 files representing 7.1GB of data. The disk was a generic SATA workstation disk with ext4. The benchmarks were entirely self-contained within the machine to eliminate any impact of ssh encryption or network traffic. No disk-based encryption was used. Read and write caches were flushed between each run. No content changed on /usr between these tests. A config.pl file or two changed in /etc/backuppc but that was it. If you see increasing times for incremental backups, it's due to BackupPC, not to changing source data. I observed that first full backups can take different amounts of time than subsequent ones, so made a point of running multiple successive backups. /var/lib/backuppc was completely wiped between the rsync and the tar tests. RSYNC RESULTS ------------- Initial full backup: 20.2 minutes Next full backup : 24.0 minutes Incremental level 1: 2.9 minutes ... (ran several level 1s to test) Incremental level 1: 4.5 minutes Incremental level 1: 4.8 minutes At this point, I enabled rsync checksum caching and ran some more backups. Full backup : 32.5 minutes Full backup : 22.7 minutes Full backup : 22.6 minutes Incremental level 1: 4.5 minutes .. Incremental level 4: 6.6 minutes Incremental level 5: 9.4 minutes TAR RESULTS ----------- Initial full backup: 16.6 minutes Incremental level 1: 2.2 minutes ... Incremental level 4: 2.4 minutes Incremental level 5: 2.2 minutes Full backup : 25.9 minutes TAR LIMITATION TESTING ---------------------- After performing the benchmarks, I created a directory /usr/local/test. In that directory, I created root and jgoerzen directories. Into each of those, I untar'd and unzipped example archive files containing files added in 2008 or before. I did this unpacking once as root and again as my usual user account. I then ran an incremental backup with tar. According to the limitations page, the unpacked files should not have been backed up. However, they were properly detected and backed up as they should have been, which is good. Next, I used mv to rename a file to a different name in the same directory and then ran another incremental. In this case, BackupPC noticed the file with the new name and backed it up. It did not notice that the old file had gone away, which is somewhat as expected. ls -lc confirmed that mv changed the ctime on the file. ANALYSIS -------- The problem that prompted this was incrementals taking very long on slow disks with rsync. My data here shows that a level 5 incremental takes more than twice as long as a level 1 with rsync. Although the difference here was measured in minutes, if the level 1 is measured in hours, then the difference is also measured in hours. Somewhat surprising was that rsync checksum caching provided only a marginal benefit (a reduction from 24.0 to 22.6 minutes for a full backup). It is possible that the data set in question here (vast numbers of small files) is not good and displaying the benefit of checksum caching. The initial full backup with tar was 18% faster than with rsync, but after checksum caching was enabled, subsequent fulls were 14% slower -- the only big surprise in this to me. More to the point, incrementals displayed little variation between runs with tar, while they continually grew longer and longer with rsync with each subsequent run. The files in this test case did not demonstrate the other pathological problem with BackupPC's rsync algorithm, that of taking 10+ hours to back up a changed 25GB file. Had such a file been involved, the tar backup would certainly have been many orders of magnitude faster than rsync. RECOMMENDATIONS --------------- For backups across a LAN, it looks like: * tar permits an overall lower execution time, since there is no performance penalty for an incremental list such as [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. With rsync, this becomes unbearably slow and more frequent full backups will be required. * A downside for using tar is that deletions will not be detected until the next full backup run. * rsync fulls with checksum caching enabled may sometimes be faster than tar fulls, but rsync fulls and incrementals will still likely be very much slower if very large files with changes are involved. * The rsync backend for BackupPC is probably not useful unless Internet backups or small backup sets to fast disks are involved. * The "limitations" of the tar backend have been exaggerated, at least for backing up Linux systems with using POSIX-obeying filesystems with GNU tar. (vfat under Linux may still exhibit the limitations documented, for instance.) One other point to make is that a long-standing bug in the CGI does not permit one to restore from a host backed up with tar to one backed up with rsync, which I did observe in testing. [2] What do you all think? Does this all make sense? Does it point to any issues in BackupPC that are easily fixable? -- John [1] http://backuppc.sourceforge.net/faq/limitations.html#incremental_backups_might_not_be_accurate [2] http://www.adsm.org/lists/html/BackupPC-users/2010-06/msg00070.html |
From: Les M. <les...@gm...> - 2011-02-25 04:27:58
|
On 2/24/11 9:34 PM, John Goerzen wrote: > > Next, I used mv to rename a file to a different name in the same > directory and then ran another incremental. > > In this case, BackupPC noticed the file with the new name and backed it > up. It did not notice that the old file had gone away, which is > somewhat as expected. ls -lc confirmed that mv changed the ctime on the > file. If you rename a directory, you won't get the old contents underneath in their new locations either. You'll still have the backup in the old path but might not know where to find it. > Somewhat surprising was that rsync checksum caching provided only a > marginal benefit (a reduction from 24.0 to 22.6 minutes for a full > backup). It is possible that the data set in question here (vast > numbers of small files) is not good and displaying the benefit of > checksum caching. The client side still has to do a full read - probably mostly constrained by directory accesses and seeks on small files. > The files in this test case did not demonstrate the other pathological > problem with BackupPC's rsync algorithm, that of taking 10+ hours to > back up a changed 25GB file. Had such a file been involved, the tar > backup would certainly have been many orders of magnitude faster than rsync. You might do better with the --whole-file option on rsync if your server is slow doing the delta/merge operations. > * The rsync backend for BackupPC is probably not useful unless > Internet backups or small backup sets to fast disks are involved. That's perhaps an overstatement. But you do need relatively fast hardware on the server side. > * The "limitations" of the tar backend have been exaggerated, at least > for backing up Linux systems with using POSIX-obeying filesystems with > GNU tar. (vfat under Linux may still exhibit the limitations > documented, for instance.) I think the docs combine the smb/tar description for this, and you would see the effect as described when using smb or tar on a client filesystem that doesn't support different mtime/ctime values. > Does it point to any issues in BackupPC that are easily fixable? Not unless someone wants to tackle what would have to change to not add the --ignore-times option on all full rsync runs. -- Les Mikesell les...@gm... |