From: Tarjei H. <ta...@be...> - 2005-11-21 07:33:21
|
Hi, thank you for your suggestions so far. I have now moved the files to new filesystems running reiser instead of XFS and tried tuning the ssh connection by adding the -C flag. > > > Let me guess: ext3 filesystem? If yes, you've just been hit by the "ext3 > > > doesn't scale for large number of files" problem. Use reiser or xfs. > > Wrong guess :-) > > The box runs XFS and reiserfs. It's a Suse 8.2 box that has been upgraded to 9.2. > Hm. Strange. The slowdown didn't appear just after you upgraded the box > to 9.2 but later? Just to be sure: your /var/lib/backuppc isn't on a > broken raid volume? Or dma has been disabled on your disks? > What hardware are you running on the backuppc server? It's an AMD 1,4 ghz box with 256 mb ram and a stadnard IDE harddrive. I did a grep on the transfer log to see what's happening there, except for the normal messages. I got a few interesting results. I'm running the backup with the --one-file-system switch so therefore the log contains many shutdowns and startups of the rsync process. It seems that not all went well: Xfer PIDs are now 21852 Got remote protocol 28 Checksum caching enabled (checksumSeed = 32761) Xfer PIDs are now 21852,21853 .. lots of files transfered finish: removing in-process file <filename removed> Child is aborting Done: 22623 files, 2548985137 bytes Got fatal error during xfer (aborted by user (signal=INT)) Backup aborted by user signal What I am wondering about on this is who is the user? Is that backuppc? If so, is it possible for backuppc to provide more info on why it ended the transfer? Will this file contain the logs of more than one transfer if a backup has been stoped and then later started? Also, it seems that there have been some linking problems: I got a few (20+ out of 500 000 files) of these: Unable to link /backup/pc/<hostname>/635/f%2fboot/fgrub/fstage2 to /backup/pc/<hostname>/new//f%2fboot/fgrub/fstage2 Any tips on why? All were related to one small directory, the /boot partition. THE MAIN PROBLEM: I've been writing on this email for a day now as the backup goes on and off. Now, I think I've found the major problem. In the logs I find a huge amount of errors like these: pool 764 1007/513 47727360 <Filename> Unable to open /backup/pc/<host>/new/f%2fdata%<path to fil> for writing Botch, no matches on /backup/pc/mail2.bergfald.no/<same file> (ed89e29bec3a01b565ad5cbbb89a3c9a) It continues like that for quite some time before exiting. So, does anyone have a suggestion for this one? Kind regards, Tarjei > I would run BackupPC_dump for a known slow host manually with the -v > flag, so you can see what's going on. If that doesn't provide enough > info, try to strace -p the process and see what's going on. > I can think of 2 problems which can cause what you're seeing: > 1. you run out of memory with the rsync filelist, which makes the server > start to swap and after that everything goes horribly slow. > 2. something is wrong with the filesystem. > > > Kind regards > > Tarjei > > Hth, > -- |