From: Mark C. <ma...@tu...> - 2012-09-17 07:36:54
|
Hi backuppc 3.1.0-9.1 rsync 3.0.7-2 OK I have a fairly decent spec backup server with 2 gigabit e1000 nics bonned together and running in bond mode 0 all working 100%. If I run plain rsync between the backup server and a backup client both connected on gigabit lan I can get sync speeds of +/- 300mbit/s but using backuppc and rsync the max speed I get is 20mbit and the backup is taking forever. Currently I have a full backup thats been running for 3461:23 minutes where as the normal rsync would have taken a few hours to complete. The data is users maildirs and its about 2.6Tb and I am not using rsync over ssh, I have the rsync daemon running on the client and have setup the .pl as follows. config # $Conf{ClientTimeout} = 28800; # Minimum period in days between full and incremental backups: $Conf{FullPeriod} = 6.97; $Conf{IncrPeriod} = 0.97; # Number of full and incremental backups to keep: $Conf{FullKeepCnt} = 2; $Conf{IncrKeepCnt} = 10; # Note that additional fulls will be kept for as long as is necessary # to support remaining incrementals. #$Conf{DumpPreUserCmd} = 'sudo /bin/mount -t nfs ns1:/var/mail /var/mail'; #$Conf{DumpPostUserCmd} = 'sudo /bin/umount /mnt/mail'; # What transport to use backup the client [smb|rsync|rsyncd|tar|archive]: $Conf{XferMethod} = 'rsyncd'; # The file system path or the name of the rsyncd module to backup when # using rsync/rsyncd: $Conf{RsyncShareName} = ['backuppc']; $Conf{RsyncdAuthRequired} = 0; $Conf{RsyncdUserName} = 'xxxxxxxx'; $Conf{RsyncdPasswd} = 'xxxxxxxx'; # If this is defined only these files/paths will be included in the backup: $Conf{BackupFilesOnly} = undef; # These files/paths will be excluded from the backup: $Conf{BackupFilesExclude} = [ '/DONOTDELETE', '/lost+found' ]; # Level of verbosity in Xfer log files: $Conf{XferLogLevel} = 1; # Commands to run for client backups: # Note the use of SSH's -C attribute. This enables compression in SSH. $Conf{RsyncClientCmd} = '$rsyncPath $argList+'; # Commands to run for client direct restores: # Note the use of SSH's -C attribute. This enables compression in SSH. $Conf{RsyncClientRestoreCmd} = '$rsyncPath $argList+'; # Compression level to use on files. 0 means no compression. See notes # in main config file before changing after backups have already been done. $Conf{CompressLevel} = 3; -- Thank you, Mark Adrian Coetser |
From: Tim F. <ti...@ni...> - 2012-09-17 12:51:34
|
You are being hit by disk io speeds, check you dont have atime turned on on the fs. Also it's worth considering tar instead of rsync for this sort of work load. -- Sent from a mobile device Tim Fletcher On 17 Sep 2012, at 10:08, Mark Coetser <ma...@tu...> wrote: > Hi > > backuppc 3.1.0-9.1 > rsync 3.0.7-2 > > OK I have a fairly decent spec backup server with 2 gigabit e1000 nics > bonned together and running in bond mode 0 all working 100%. If I run > plain rsync between the backup server and a backup client both connected > on gigabit lan I can get sync speeds of +/- 300mbit/s but using backuppc > and rsync the max speed I get is 20mbit and the backup is taking > forever. Currently I have a full backup thats been running for 3461:23 > minutes where as the normal rsync would have taken a few hours to complete. > > The data is users maildirs and its about 2.6Tb and I am not using rsync > over ssh, I have the rsync daemon running on the client and have setup > the .pl as follows. > > config > > # > $Conf{ClientTimeout} = 28800; > > # Minimum period in days between full and incremental backups: > $Conf{FullPeriod} = 6.97; > $Conf{IncrPeriod} = 0.97; > > # Number of full and incremental backups to keep: > $Conf{FullKeepCnt} = 2; > $Conf{IncrKeepCnt} = 10; > # Note that additional fulls will be kept for as long as is necessary > # to support remaining incrementals. > > #$Conf{DumpPreUserCmd} = 'sudo /bin/mount -t nfs ns1:/var/mail /var/mail'; > #$Conf{DumpPostUserCmd} = 'sudo /bin/umount /mnt/mail'; > > # What transport to use backup the client [smb|rsync|rsyncd|tar|archive]: > $Conf{XferMethod} = 'rsyncd'; > > # The file system path or the name of the rsyncd module to backup when > # using rsync/rsyncd: > $Conf{RsyncShareName} = ['backuppc']; > > $Conf{RsyncdAuthRequired} = 0; > > $Conf{RsyncdUserName} = 'xxxxxxxx'; > $Conf{RsyncdPasswd} = 'xxxxxxxx'; > > # If this is defined only these files/paths will be included in the backup: > $Conf{BackupFilesOnly} = undef; > > # These files/paths will be excluded from the backup: > $Conf{BackupFilesExclude} = [ > '/DONOTDELETE', > '/lost+found' > ]; > > # Level of verbosity in Xfer log files: > $Conf{XferLogLevel} = 1; > > # Commands to run for client backups: > # Note the use of SSH's -C attribute. This enables compression in SSH. > $Conf{RsyncClientCmd} = '$rsyncPath $argList+'; > > # Commands to run for client direct restores: > # Note the use of SSH's -C attribute. This enables compression in SSH. > $Conf{RsyncClientRestoreCmd} = '$rsyncPath $argList+'; > > # Compression level to use on files. 0 means no compression. See notes > # in main config file before changing after backups have already been done. > $Conf{CompressLevel} = 3; > > > > -- > Thank you, > > Mark Adrian Coetser > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > BackupPC-users mailing list > Bac...@li... > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users > Wiki: http://backuppc.wiki.sourceforge.net > Project: http://backuppc.sourceforge.net/ > |
From: Mark C. <ma...@tu...> - 2012-09-17 12:59:55
|
On 17/09/2012 14:50, Tim Fletcher wrote: > You are being hit by disk io speeds, check you dont have atime turned on on the fs. Also it's worth considering tar instead of rsync for this sort of work load. > > -- Hi Surely disk io would affect normal rsync as well? Normal rsync and even nfs get normal transfer speeds its only rsync within backuppc that is slow. Thank you, Mark Adrian Coetser |
From: Les M. <les...@gm...> - 2012-09-17 15:01:36
|
On Mon, Sep 17, 2012 at 7:59 AM, Mark Coetser <ma...@tu...> wrote: > Surely disk io would affect normal rsync as well? Normal rsync and even > nfs get normal transfer speeds its only rsync within backuppc that is slow. > Backuppc uses its own rsync implementation in perl on the server side so it will probably not match the native version's speed. Is this the first or 2nd full run? On the first it will have to compress and create the pool hash file links. On the 2nd it will read/uncompress everything for block-checksum verification. If you have enabled checksum caching, fulls after the 2nd will not have to read/uncompress unchanged files on the server side. -- Les Mikesell les...@gm... |
From: Mark C. <ma...@tu...> - 2012-09-17 15:16:50
|
On 17/09/2012 17:01, Les Mikesell wrote: > On Mon, Sep 17, 2012 at 7:59 AM, Mark Coetser<ma...@tu...> wrote: > >> Surely disk io would affect normal rsync as well? Normal rsync and even >> nfs get normal transfer speeds its only rsync within backuppc that is slow. >> > > Backuppc uses its own rsync implementation in perl on the server side > so it will probably not match the native version's speed. Is this the > first or 2nd full run? On the first it will have to compress and > create the pool hash file links. On the 2nd it will read/uncompress > everything for block-checksum verification. If you have enabled > checksum caching, fulls after the 2nd will not have to read/uncompress > unchanged files on the server side. > Its the first full run but its taking forever to complete, it was running for nearly 3 days! Thank you, Mark Adrian Coetser |
From: Timothy J M. <tm...@ob...> - 2012-09-17 17:04:56
|
Mark Coetser <ma...@tu...> wrote on 09/17/2012 11:16:29 AM: > Its the first full run but its taking forever to complete, it was > running for nearly 3 days! *IF* the backup is bandwidth-limited, the first run will take longer than subsequent runs. How much depends on how bandwidth-limited you are! When I back up clients over the Internet, the initial backups can take a *very* long time (more than a week). Subsequent full backups take maybe 3-4 hours. However, for hosts across a fast LAN, this will not be the most significant part of your slowdown. Given your network specs, I doubt that this is it. Timothy J. Massey Out of the Box Solutions, Inc. Creative IT Solutions Made Simple! http://www.OutOfTheBoxSolutions.com tm...@ob... 22108 Harper Ave. St. Clair Shores, MI 48080 Office: (800)750-4OBS (4627) Cell: (586)945-8796 |
From: Mark C. <ma...@tu...> - 2012-09-18 14:21:59
|
On 17/09/2012 17:16, Mark Coetser wrote: > On 17/09/2012 17:01, Les Mikesell wrote: >> On Mon, Sep 17, 2012 at 7:59 AM, Mark Coetser<ma...@tu...> wrote: >> >>> Surely disk io would affect normal rsync as well? Normal rsync and even >>> nfs get normal transfer speeds its only rsync within backuppc that is slow. >>> >> >> Backuppc uses its own rsync implementation in perl on the server side >> so it will probably not match the native version's speed. Is this the >> first or 2nd full run? On the first it will have to compress and >> create the pool hash file links. On the 2nd it will read/uncompress >> everything for block-checksum verification. If you have enabled >> checksum caching, fulls after the 2nd will not have to read/uncompress >> unchanged files on the server side. I am busy running a full clean rsync to time exactly how long it will take and will post results compared to a clean full backup with backuppc, I can tell you that the network interface on the backup server is currently running at 200Mbs transfer speed. Thank you, Mark Adrian Coetser |
From: Timothy J M. <tm...@ob...> - 2012-09-18 15:38:15
|
Mark Coetser <ma...@tu...> wrote on 09/18/2012 10:21:42 AM: > I am busy running a full clean rsync to time exactly how long it will > take and will post results compared to a clean full backup with > backuppc, I can tell you that the network interface on the backup server > is currently running at 200Mbs transfer speed. Once it is complete, wait a day or so and then re-run the full backup *and* the native rsync over the contents of the first rsync. That will be a pretty good comparison of not-first backups. Frankly, you don't even have to wait. Once each of them are complete, immediately re-run them and compare the results. There may be a very minimal amount of new files, but that isn't going to affect the order of magnitude of the results, and the difference you will see will be in the neighborood of that. That will tell you what I think I already know: the Perl-based rsync is *terrible*. But it makes the magic of BackupPC work--if you feed it enough resources, it seems. Timothy J. Massey Out of the Box Solutions, Inc. Creative IT Solutions Made Simple! http://www.OutOfTheBoxSolutions.com tm...@ob... 22108 Harper Ave. St. Clair Shores, MI 48080 Office: (800)750-4OBS (4627) Cell: (586)945-8796 |
From: Mark C. <ma...@tu...> - 2012-09-19 09:21:23
|
On 18/09/2012 17:34, Timothy J Massey wrote: > Mark Coetser <ma...@tu...> wrote on 09/18/2012 10:21:42 AM: > > > I am busy running a full clean rsync to time exactly how long it will > > take and will post results compared to a clean full backup with > > backuppc, I can tell you that the network interface on the backup server > > is currently running at 200Mbs transfer speed. > > Once it is complete, wait a day or so and then re-run the full backup > *and* the native rsync over the contents of the first rsync. That will > be a pretty good comparison of not-first backups. > > Frankly, you don't even have to wait. Once each of them are complete, > immediately re-run them and compare the results. There may be a very > minimal amount of new files, but that isn't going to affect the order of > magnitude of the results, and the difference you will see will be in the > neighborood of that. > > That will tell you what I think I already know: the Perl-based rsync is > *terrible*. But it makes the magic of BackupPC work--if you feed it > enough resources, it seems. sent 150123974 bytes received 1745633674066 bytes 13566176.70 bytes/sec total size is 2663529076313 speedup is 1.53 real 2144m46.748s user 144m11.809s sys 668m4.861s Thank you, Mark Adrian Coetser |
From: Mark C. <ma...@tu...> - 2012-09-19 10:00:04
|
On 19/09/2012 11:21, Mark Coetser wrote: > On 18/09/2012 17:34, Timothy J Massey wrote: >> Mark Coetser<ma...@tu...> wrote on 09/18/2012 10:21:42 AM: >> >> > I am busy running a full clean rsync to time exactly how long it will >> > take and will post results compared to a clean full backup with >> > backuppc, I can tell you that the network interface on the backup server >> > is currently running at 200Mbs transfer speed. >> >> Once it is complete, wait a day or so and then re-run the full backup >> *and* the native rsync over the contents of the first rsync. That will >> be a pretty good comparison of not-first backups. >> >> Frankly, you don't even have to wait. Once each of them are complete, >> immediately re-run them and compare the results. There may be a very >> minimal amount of new files, but that isn't going to affect the order of >> magnitude of the results, and the difference you will see will be in the >> neighborood of that. >> >> That will tell you what I think I already know: the Perl-based rsync is >> *terrible*. But it makes the magic of BackupPC work--if you feed it >> enough resources, it seems. > > sent 150123974 bytes received 1745633674066 bytes 13566176.70 bytes/sec > total size is 2663529076313 speedup is 1.53 > > real 2144m46.748s > user 144m11.809s > sys 668m4.861s Looks like the above was while the md device was resyncing in the background as well which will skew the results. I will wait for the resync to complete then rerun... Thank you, Mark Adrian Coetser |
From: Mark C. <ma...@tu...> - 2012-09-25 11:48:56
|
On 19/09/2012 11:59, Mark Coetser wrote: > On 19/09/2012 11:21, Mark Coetser wrote: >> >> sent 150123974 bytes received 1745633674066 bytes 13566176.70 bytes/sec >> total size is 2663529076313 speedup is 1.53 >> >> real 2144m46.748s >> user 144m11.809s >> sys 668m4.861s > > Looks like the above was while the md device was resyncing in the > background as well which will skew the results. I will wait for the > resync to complete then rerun... Some more feedback, looks like the rsync itself isnt whats causing my slow backup it seems the backuppc_link process is whats taking forever to finish. Thank you, Mark Adrian Coetser |
From: Les M. <les...@gm...> - 2012-09-18 16:45:49
|
On Tue, Sep 18, 2012 at 9:21 AM, Mark Coetser <ma...@tu...> wrote: >> > I am busy running a full clean rsync to time exactly how long it will > take and will post results compared to a clean full backup with > backuppc, I can tell you that the network interface on the backup server > is currently running at 200Mbs transfer speed. What's the target here? It takes a pretty good disk system at both ends to sustain rates like that, especially if you have a tree of small files. -- Les Mikesell les...@gm... |
From: Tim F. <ti...@ni...> - 2012-09-17 20:57:16
|
No it won't in the same way, you are basically asking rsync to walk the large and complex file tree checking the date of every file, where as with a full rsync all you are asking for is "next file, next lie, next file" -- Sent from a mobile device On 17 Sep 2012, at 15:59, Mark Coetser <ma...@tu...> wrote: > On 17/09/2012 14:50, Tim Fletcher wrote: >> You are being hit by disk io speeds, check you dont have atime turned on on the fs. Also it's worth considering tar instead of rsync for this sort of work load. >> >> -- > Hi > > Surely disk io would affect normal rsync as well? Normal rsync and even nfs get normal transfer speeds its only rsync within backuppc that is slow. > > Thank you, > > Mark Adrian Coetser > |
From: Les M. <les...@gm...> - 2012-09-17 15:51:21
|
On Mon, Sep 17, 2012 at 10:16 AM, Mark Coetser <ma...@tu...> wrote: > > > > Its the first full run but its taking forever to complete, it was running > for nearly 3 days! As long is it makes it through, don't make any judgements until after the 3nd full, and be sure you have set up checksum caching before doing the 2nd. Incrementals should be reasonably fast if you don't have too much file churn but you still need to run fulls to rebase the comparison tree. -- Les Mikesell les...@gm... |
From: Les M. <les...@gm...> - 2012-09-17 17:34:44
|
On Mon, Sep 17, 2012 at 11:05 AM, Timothy J Massey <tm...@ob...> wrote: > > > I'm writing a longer reply, but here's a quick in-thread reply: > > I know exactly what you mean by waiting until after the first full. Often > the second full will be faster -- but only *IF* you are bandwidth limited > will you will see an improvement. In this case, neither him nor I are > bandwidth limited. I don't see an improvement. The 2nd might even be slower, since the server side has to decompress and recompute the checksums. > I am routinely limited to no more than 30MB to 60MB per *minute* as the > maximum performance for my rsync-based backups. This is *really* pretty > terrible. I also see that the system is at 100% CPU usage when doing a > backup. So, my guess is that the Perl-based rsync used by BackupPC is to > blame. I'd blame the CPU first. It's easier to replace with something faster... > So, I have two CPU-bound tasks and they're both fighting over the same > core. > > Is there anything that can be done about this? Not sure about that - I always expected the kernel scheduler to do something sensible, but maybe not. > A quick aside about checksum caching: I very much *want* the ability to > check to make sure if my backup data is corrupted *before* there is an > issue, so I do not use checksum caching. So, yes, this puts much greater > stress on disk I/O: both sides have to recalculate the checksums for each > and every file. But the client can do it without monopolizing 100% of the > CPU; the BackupPC side should be able to, too... Backuppc is decompressing, and doing it all in perl, so I'd expect that to be less efficient. However, there is a setting to control how much of the data (a random percentage) is checksum-checked even with caching enabled, so you can tune the timing vs. risk to some extent. There's little risk of file-level corruption that would still let the checksums cached at the end of the file match unless you have bad RAM (which would likely cause crashes) or physical disk block corruption which you can check relatively quickly with a 'cat /dev/sd? >/dev/null' or a smartctl test run followed by checking the status. -- Les Mikesell les...@gm... |
From: Timothy J M. <tm...@ob...> - 2012-09-18 14:11:46
|
Les Mikesell <les...@gm...> wrote on 09/17/2012 01:34:33 PM: > On Mon, Sep 17, 2012 at 11:05 AM, Timothy J Massey > <tm...@ob...> wrote: > > > > > > I'm writing a longer reply, but here's a quick in-thread reply: > > > > I know exactly what you mean by waiting until after the first full. Often > > the second full will be faster -- but only *IF* you are bandwidth limited > > will you will see an improvement. In this case, neither him nor I are > > bandwidth limited. I don't see an improvement. > > The 2nd might even be slower, since the server side has to decompress > and recompute the checksums. Interesting possibility. However, on the "big" server, it's new enough to see the first backup. First one took 9987 minutes, and the second took 5558. The third took 5502. So there *was* a significant "speedup" between the first and second. But I'm still only getting 442MB/minute, or 7MB/s. That server really should be able to get 5 times as much without breaking a sweat. 1100 minutes is still a long time, but manageable. 5500 minutes is nearly four *days*... :( > There's little risk of file-level corruption that would still let the > checksums cached at the end of the file match unless you have bad RAM > (which would likely cause crashes) or physical disk block corruption > which you can check relatively quickly with a 'cat /dev/sd? > >/dev/null' or a smartctl test run followed by checking the status. It's disk block corruption caused by (silently) failing drives I'm most worried about. It wasn't until a data-loss event a few months ago that I found that smartd was not set up properly--and I'm not certain that SMART would have actually helped in this case. (Fortunately, the loss of data was on the backup server alone, and no one needed that data at that moment, and my off-site archives were unaffected, but I still didn't like it.) I *do* scrub the array (the default is weekly, I believe), and now have both SMART and md configured to e-mail alerts. But I still like the extra protection of *directly* comparing the files. But not enough to take 4 days to do a backup! :) I'm working on investigating each of these possibilities to improve performance. I will let everyone know what I find. Timothy J. Massey Out of the Box Solutions, Inc. Creative IT Solutions Made Simple! http://www.OutOfTheBoxSolutions.com tm...@ob... 22108 Harper Ave. St. Clair Shores, MI 48080 Office: (800)750-4OBS (4627) Cell: (586)945-8796 |
From: John R. <rou...@re...> - 2012-09-17 17:55:41
|
On Mon, Sep 17, 2012 at 12:58:28PM -0400, Timothy J Massey wrote: > Les Mikesell <les...@gm...> wrote on 09/17/2012 11:01:25 AM: > > On the 2nd it will read/uncompress > > everything for block-checksum verification. If you have enabled > > checksum caching, fulls after the 2nd will not have to read/uncompress > > unchanged files on the server side. > > I'm going to have to test this... but I really don't like the fact that > with checksum caching a file corrupted on the backup server will remain > undetected--until the user tells me so when I restore it... :( AFAIK this is not correct. If checksum caching is enabled, backuppc will check the cached checksums against the actual file contents based on the setting of the: $Conf{RsyncCsumCacheVerifyProb} = 0.10; variable. So in my case it would take about 10 days to verify the pool. Granted this is a probabilistic ratio, and I haven't looked at the code to make sure that this means: every file in the pool would be checked every 10 days only files that exist on the end clients are checked every 10 days some files may be checked every day in 10 days and some files won't be checked but this addresses the issue you brought up. Also even with checksums turned off, you could have an older copy of the file in the pool go bad and you wouldn't know it till you tried to restore it, so having checksum off doesn't protect you from bad data in the pool except for the most recently used files. (We also won't discuss having corruption in putting the current bits on disks that trashes the curent backup copy. Which frankly there is very little short of zfs/btrfs type filesystem that can provide some measure of protection/detection.) -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 |
From: Timothy J M. <tm...@ob...> - 2012-09-18 14:17:58
|
John Rouillard <rou...@re...> wrote on 09/17/2012 01:30:36 PM: > AFAIK this is not correct. If checksum caching is enabled, backuppc > will check the cached checksums against the actual file contents based > on the setting of the: > > $Conf{RsyncCsumCacheVerifyProb} = 0.10; Yeah, I probably should have mentioned that. I knew it was there, but I'd rather have that set to 1.00 than 0.10... :) > Also even with checksums turned off, you could have an older copy of > the file in the pool go bad and you wouldn't know it till you tried to > restore it, so having checksum off doesn't protect you from bad data > in the pool except for the most recently used files. This is perfectly valid. However, I'm less worried about data that has multiple copies (older != newer) than I am about important data (or data that *becomes* important) but hasn't been modified in enough time so that there is only a single copy of it stored--the same copy in each of, say, six months of backups. I've had that happen in *more* than one case. (Another related case is a "vitally important" file that disappeared at least six months ago but *must* be brought back...) Both of these can be dealt with by duplicate externally managed archives, etc. but this is exponentially more annoying to deal with than BackupPC. If I can solve a problem there, I would prefer to. Monolithic 500GB tar files (aka BackupPC archvies) are not user-friendly. > (We also won't discuss having corruption in putting the current bits > on disks that trashes the curent backup copy. Which frankly there is > very little short of zfs/btrfs type filesystem that can provide some > measure of protection/detection.) Or, like I said, multiple externally managed archives using separate and redundant medium. I have that, too. It's just a big pain to use, that I would rather do nearly anything else than depend on it! :) Unfortunately, none of this gets us closer to the source of the terrible performance we're seeing... :) Timothy J. Massey Out of the Box Solutions, Inc. Creative IT Solutions Made Simple! http://www.OutOfTheBoxSolutions.com tm...@ob... 22108 Harper Ave. St. Clair Shores, MI 48080 Office: (800)750-4OBS (4627) Cell: (586)945-8796 |
From: Frédéric M. <fre...@ju...> - 2012-09-17 13:47:22
|
Le 17/09/2012 14:50, Tim Fletcher a écrit : > You are being hit by disk io speeds, check you dont have atime turned > on on the fs. Also it's worth considering tar instead of rsync for > this sort of work load. Hi, Is the relatime option is acceptable to replace atime? Regards. -- ============================================== | FRÉDÉRIC MASSOT | | http://www.juliana-multimedia.com | | mailto:fre...@ju... | | +33.(0)2.97.54.77.94 +33.(0)6.67.19.95.69 | ===========================Debian=GNU/Linux=== |
From: Tyler J. W. <ty...@to...> - 2012-09-17 15:01:43
|
On 2012-09-17 14:18, Frédéric Massot wrote: > Is the relatime option is acceptable to replace atime? Unless you are using mutt on the BackupPC server, use noatime. There is no longer any common use for file access time. Regards, Tyler -- "We should forget about small efficiencies, say about 97% of the time; premature optimization is the root of all evil." -- Donald Knuth |
From: Timothy J M. <tm...@ob...> - 2012-09-17 16:09:20
|
Les Mikesell <les...@gm...> wrote on 09/17/2012 11:51:09 AM: > On Mon, Sep 17, 2012 at 10:16 AM, Mark Coetser <ma...@tu...> wrote: > > > > > > > Its the first full run but its taking forever to complete, it was running > > for nearly 3 days! > > As long is it makes it through, don't make any judgements until after > the 3nd full, and be sure you have set up checksum caching before > doing the 2nd. Incrementals should be reasonably fast if you don't > have too much file churn but you still need to run fulls to rebase the > comparison tree. I'm writing a longer reply, but here's a quick in-thread reply: I know exactly what you mean by waiting until after the first full. Often the second full will be faster -- but only *IF* you are bandwidth limited will you will see an improvement. In this case, neither him nor I are bandwidth limited. I don't see an improvement. I am routinely limited to no more than 30MB to 60MB per *minute* as the maximum performance for my rsync-based backups. This is *really* pretty terrible. I also see that the system is at 100% CPU usage when doing a backup. So, my guess is that the Perl-based rsync used by BackupPC is to blame. The other annoying part of this is that top shows 50% idle CPU. That's because I have two cores. One of them is sitting there doing *nothing*, while the other is at 100%. The icing on the cake is that there are *two* BackupPC_dump processes, each trying to consume as much CPU as they can--but they're both on the same core! A typical top: top - 13:07:44 up 36 min, 1 user, load average: 1.97, 1.89, 1.52 Tasks: 167 total, 3 running, 164 sleeping, 0 stopped, 0 zombie Cpu(s): 46.1%us, 2.4%sy, 0.0%ni, 49.4%id, 2.0%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 3924444k total, 3809232k used, 115212k free, 11008k buffers Swap: 0k total, 0k used, 0k free, 3280072k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1731 backuppc 20 0 357m 209m 1344 R 100.0 5.5 24:14.07 BackupPC_dump 1679 backuppc 20 0 353m 205m 2208 R 92.5 5.4 21:54.89 BackupPC_dump So, I have two CPU-bound tasks and they're both fighting over the same core. Is there anything that can be done about this? A quick aside about checksum caching: I very much *want* the ability to check to make sure if my backup data is corrupted *before* there is an issue, so I do not use checksum caching. So, yes, this puts much greater stress on disk I/O: both sides have to recalculate the checksums for each and every file. But the client can do it without monopolizing 100% of the CPU; the BackupPC side should be able to, too... Tim Massey Out of the Box Solutions, Inc. Creative IT Solutions Made Simple! http://www.OutOfTheBoxSolutions.com tm...@ob... 22108 Harper Ave. St. Clair Shores, MI 48080 Office: (800)750-4OBS (4627) Cell: (586)945-8796 |
From: Rodrigo S. <ro...@fa...> - 2012-09-17 16:20:58
|
On Mon, Sep 17, 2012 at 12:16 PM, Mark Coetser <ma...@tu...> wrote: > > Its the first full run but its taking forever to complete, it was > running for nearly 3 days! I'm seeing similar issues here. Is there any troubleshooting recommended to this kind of problem? Rodrigo |
From: Timothy J M. <tm...@ob...> - 2012-09-17 17:07:10
|
Rodrigo Severo <ro...@fa...> wrote on 09/17/2012 11:22:23 AM: > On Mon, Sep 17, 2012 at 12:16 PM, Mark Coetser <ma...@tu...> wrote: > > Its the first full run but its taking forever to complete, it was > running for nearly 3 days! > > I'm seeing similar issues here. > > Is there any troubleshooting recommended to this kind of problem? For the first run, pay attention to network utilization. There are no existing files for BackupPC to do anything with: it's basically absorbing a bunch of new files. So you are most likely going to be limited by the speed of the connection--even if it's a Gigabit connection. For subsequent runs, see my other (very long) e-mail. Examine the CPU usage (and I/O usage!) of your BackupPC server and see what is limiting you. Timothy J. Massey Out of the Box Solutions, Inc. Creative IT Solutions Made Simple! http://www.OutOfTheBoxSolutions.com tm...@ob... 22108 Harper Ave. St. Clair Shores, MI 48080 Office: (800)750-4OBS (4627) Cell: (586)945-8796 |
From: John R. <rou...@re...> - 2012-09-19 18:10:45
|
On Mon, Sep 17, 2012 at 12:22:23PM -0300, Rodrigo Severo wrote: > On Mon, Sep 17, 2012 at 12:16 PM, Mark Coetser <ma...@tu...> wrote: > > Its the first full run but its taking forever to complete, it was > > running for nearly 3 days! > I'm seeing similar issues here. > Is there any troubleshooting recommended to this kind of problem? I usually run lsof to see what files rsync has open. Note there are usually two of them, IIRC one is the child of the other. You want to do an lsof on the child to see what file it has open and it is backing up. Use process monitor from sysinternals if you are on windows. Then use strace/truss to see what system calls are going on. That can tell you if it's waiting/deadlocked or actually doing something useful (you should see read/writes if it's working). If looking at the client side doesn't help try looking at the perl processes on the server side. Occasionally the server side and client side processes get into what looks like a dealock. Both of them just sit there polling for the other side. Not sure what causes that (one theory some bad info in the attributes file for the reference dump) but usually forcing a new full backup gets me past it. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 |
From: Timothy J M. <tm...@ob...> - 2012-09-17 16:58:43
|
Mark Coetser <ma...@tu...> wrote on 09/17/2012 03:08:49 AM: > Hi > > backuppc 3.1.0-9.1 > rsync 3.0.7-2 > > OK I have a fairly decent spec backup server with 2 gigabit e1000 nics > bonned together and running in bond mode 0 all working 100%. If I run > plain rsync between the backup server and a backup client both connected > on gigabit lan I can get sync speeds of +/- 300mbit/s but using backuppc > and rsync the max speed I get is 20mbit and the backup is taking > forever. Currently I have a full backup thats been running for 3461:23 > minutes where as the normal rsync would have taken a few hours to complete. > > The data is users maildirs and its about 2.6Tb and I am not using rsync > over ssh, I have the rsync daemon running on the client and have setup > the .pl as follows. I have several very similar configurations. Here's an example: Atom D510 (1.66GHz x 2 Cores) 4GB RAM CentOS 6 64-bit 4 x 2TB Seagate SATA drives in RAID-6 configuration I get almost 200 MB/s transfer rate from this array... 2 x Intel e1000 NICs in bonded mode. In the past, the biggest server I backed up was around 1TB. Personally, I prefer to keep each server image under 1TB if I can help it. Everything is easiser that way: not just file-level backups with BackupPC but image level as well, and there's less downtime (or less time with noticaeable slowdown if it is up) when having to take such images. With servers <1TB, rsync-based BackupPC full backups are slow, but get done in a reasonable amount of time: 8-12 hours, and I can live with that. It is usually kind of beneficial: if I start a backup in the middle of the day it does not hammer the client I'm backing up noticeably. (Lemons, lemonade... :) ) However, I have recently inherited a server that is >3TB big, and 97% full, too! Backups of that system take 3.5 *days* to complete. I *can't* live with that. I need better performance. I was going to write a very similar e-mail to what you wrote as well! So maybe we can work this together. All of your configuration looks pretty straightforward to me (except the mounts: I'm not sure why you have them if you're using rsyncd). Mine are quite similar. No matter the size of the system, I seem to top out at about 50GB/hour for full backups. Here is a perfectly typical example: Full Backup: 769.3 minutes for 675677.3MB of data. That works out to be 878MB/min, or about 15MB/s. For a system with an array that can move 200MB/s, and a network system that can move at least 70MB/s. Now, let's look at the "big" server: Full backup: 5502.8 minutes for 2434613.6MB of data. That's even worse: 442MB/min. And 5502.8 minutes is three and a half *DAYS*. First, a quick look at the client will show that we can eliminate it completely. I have checked the performance of several of them while a backup is running. The client is not CPU or I/O or memory bound whatsoever. Here is a typical example: a Windows Server 2008. Task Manager shows minimal everything: between 0% and 20% CPU usage (with most time below 5%), and more than 1GB of 2GB RAM free (with 1300MB of cached memory). Network utilization is absolutely flatlined! A quick sanity check of the server's physical drive lights show that the drive activity is in brief fits and starts. This system is *clearly* not being taxed. By the way, this contrasts to the beginning of the backup, when rsync is building the file list. The rsync daemon's CPU usage bounces around with peaks over 70%, and the drives are blinking constantly during this process--so the server is perfectly capable of doing something when it's asked to! The server side, though, shows something completely different. Here is a few lines from dstat: ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 33 2 64 1 0 0| 22M 47k| 0 0 | 0 0 |1711 402 43 3 49 6 0 0| 40M 188k| 35k 1504B| 0 0 |2253 632 45 4 49 1 0 1| 50M 36k| 38k 1056B| 0 0 |2660 909 46 4 50 0 0 0| 46M 0 | 55k 1754B| 0 0 |2540 622 45 4 50 1 0 0| 45M 12k| 120B 314B| 0 0 |2494 708 43 3 50 3 0 0| 42M 0 | 77k 1584B| 0 0 |2613 958 41 4 47 8 0 0| 50M 268k| 449B 356B| 0 0 |2333 704 46 3 50 1 0 0| 42M 36k| 26k 1122B| 0 0 |2583 771 45 4 50 1 0 0| 40M 0 | 30k 726B| 0 0 |2499 681 It looks like everything is under-utilized. For example, I'm getting a measly 40-50MB of read performance from my array of four drives, and *nothing* is going out over the network. My physical drive and network lights echo this: they are *not* busy. My interrupts are certainly manageable and context switches are very low. Even my CPU numbers look tremendous: nearly no time in wait, and about 50% CPU idle! Ah, but there's a problem with that. This is a dual-core system. Any time you see a dual-core system that is stuck at 50% CPU utilization, you can bet big that you have a single process that is using 100% of the CPU of a single core, and the other core is sitting there idle. That's exactly what's happening here. Notice what top shows us: top - 13:21:27 up 49 min, 1 user, load average: 2.07, 1.85, 1.67 Tasks: 167 total, 2 running, 165 sleeping, 0 stopped, 0 zombie Cpu(s): 43.7%us, 3.6%sy, 0.0%ni, 50.5%id, 2.1%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 3924444k total, 3774644k used, 149800k free, 9640k buffers Swap: 0k total, 0k used, 0k free, 3239600k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1731 backuppc 20 0 357m 209m 1192 R 95.1 5.5 35:58.08 BackupPC_dump 1679 backuppc 20 0 360m 211m 1596 D 92.1 5.5 32:54.18 BackupPC_dump My load average is 2, and you can see those two processes: two instances of BackupPC_dump. *Each* of them are using 100% of the CPU given to them, but they're both using the *same* CPU (core), which is why I have 50% idle! Mark Coetser, can you see what top shows for the CPU utilization for your system while doing a backup? Don't just look at the single "idle" or "user" numbers: look at each BackupPC process as well, and let us know what they are--and how many physical (and hyper-threaded) cores you have. Additional info can be found in /proc/cpuinfo if you don't know the answers. To everyone: is there a way to get Perl to allow each of these items to run on *different* processes? From my quick Google it seems that the processes must be forked using Perl modules designed for this purpose. At the moment, this is beyond my capability. Am I missing an easier way to do this? And one more request: for those of you out there using rsync, can you give me some examples where you are getting faster numbers? Let's say, full backups of 100GB hosts in roughly 30-35 minutes, or 500GB hosts in two or three hours? That's about four times faster than what I'm seeing, and would work out to be 50-60MB/s, which seems like a much more realistic speed. If you are seeing such speed, can you give us an idea of your hardware configuration, as well as an idea of the CPU utilization you're seeing during the backups? Also, are you using compression or checksum caching? If you need help collecting this info, I'd be happy to help you. To cover a couple of other frequently suggested items, here's what I've examined to improve this: Yes, I have noatime. From fstab: UUID=<snipped> /data ext4 defaults,noatime 1 2 Noatime only makes a difference when you are I/O bound--which ideally a BackupPC server would be. In my case, it made very little difference. I'm not I/O bound. I am using EXT4. I have gotten very similar performance with EXT3. Have not tried XFS or JFS, but would *really* prefer to keep my backups on the extremely well-known and supported EXT series. I am using compression on this BackupPC server. Obviously, this may contribute to the CPU consumption. My old servers did not have compression, but had terrible VIA C3 single-core processors. And their backup performance was quite similar. I figured with the Atom D510 I'd be OK with compression. But maybe not. I'll try to see if I can do some testing with some smaller hosts without compression and see what happens. As for checksum caching: As I mentioned, I think the strength of leaving it off is very valuable. But I look forward to seeing the performance others are getting and how they compare to see at what performance cost this protection is coming. Thank you very much for your help! Timothy J. Massey Out of the Box Solutions, Inc. Creative IT Solutions Made Simple! http://www.OutOfTheBoxSolutions.com tm...@ob... 22108 Harper Ave. St. Clair Shores, MI 48080 Office: (800)750-4OBS (4627) Cell: (586)945-8796 |