From: Craig B. <cr...@at...> - 2002-10-31 06:23:47
|
> 4) Started BackupPC and ran BackupPC_nightly manually - Machine crashed and > burned about 30 minutes into whatever it was doing! A user-level process should never cause the machine to hang or reboot. It sounds like your file system is corrupted. > 5) Checked logs - logs show BackupPC_nightly always starts but never reports > ending (should it?) Yes, it will report finishing if it gets that far. > 6) 1 machine log reports a link was deleted because there is no older file > (words to that effect - if I remember correctly) - may have nothing to do > with anything. This is most likely unrelated. > What does BackupPC_nightly do? (I'll now RTFM 'cause the answer is probably > there to be had). It spends most of its time cleaning the pool. This involves scanning the entire pool and cpool removing files with only one link. What happens when you run a command like: find /data/BackupPC/cpool -type f -links 1 -print Does it complete, or does you system crash? I'd recommend running fsck on the file system. > Is there a log of progress for BackupPC_nightly? No, it doesn't say much about what it is doing. > Can I stop BackupPC_nightly temporarily while maintaining backup integrity > until a problem resolution can be had? You could disable BackupPC_nightly, but then the nightly emails will stop, per-PC log files won't get aged (monthly) and the pool will never get smaller. For a few days this is probably ok. > Is the integrity of my backup OK? How can I tell? BackupPCs should be ok, but I would recommend you find out and fix whatever is going wrong with your system. Craig |
From: Phill B. <Phill@WebWombat.com> - 2002-10-31 10:02:11
|
Hi Chris, There are even more coincidences (is that really the plural for this word?). The machine I'm using for BackupPC also has the Promise Raid IDE controller. The MB support 8 ATA drives. Unfortunately the Promise people don't support this hardware config with their ROM code, but luckily the IDE driver was hacked up and simply makes the drives available. I'm using software raid. All this to say that the hardware may not be blameless in this scenario. The PromiseRAID controller could well be implicated here. Anyway this is all speculation at the moment. I've yet to carry out the test as specified by Craig. I started it this afternoon and it bagan blurting out lots of file name under the cpool directory that looked like MD5s. I stopped it since I wanted to ride home (bicycle ride that is). I'll start it again now (I'm here now). Phill. -----Original Message----- From: Chris Snyder To: Chris Snyder Cc: Phill Bertolus; backuppc-users Sent: 10/31/2002 6:05 PM Subject: Re: [BackupPC-users] Caught ya: BackupPC_nightly is the culprit! Found this interesting thread on a mailing list: http://www.geocrawler.com/archives/3/3455/2002/1/0/7690030/ Apparently there's a limit as to how many hardlinks ReiserFS can handle. I'm currently backing up around ten machines, totalling >120 gig of data. Could I be hitting this limit? |
From: Phill B. <Phill@WebWombat.com> - 2002-10-31 10:11:19
|
Hi Chris, Just read the post. I'll preface my observations by saying I'm not a unix file system expert. My question is: Does this post state that the maximum hard links to a given file is 64535 or that the total number of all hard links on a file system is 64535? If it is the former (the first one) then it's not an issue is it? However, if it's the latter case, then it would seem the whole file system could be somewhat impaired in this application. What do you think? Phill. -----Original Message----- From: Chris Snyder To: Chris Snyder Cc: Phill Bertolus; backuppc-users Sent: 10/31/2002 6:05 PM Subject: Re: [BackupPC-users] Caught ya: BackupPC_nightly is the culprit! Found this interesting thread on a mailing list: http://www.geocrawler.com/archives/3/3455/2002/1/0/7690030/ Apparently there's a limit as to how many hardlinks ReiserFS can handle. I'm currently backing up around ten machines, totalling >120 gig of data. Could I be hitting this limit? |
From: Phill B. <Phill@WebWombat.com> - 2002-10-31 12:05:46
|
OK here's some output from this command: find /disk7/cpool -type f -links 1 -print /disk7/cpool/7/1/c/71c6b69005d159b8bffb9fdf7408f9a4 /disk7/cpool/7/1/c/71c8e892c3948bf6faa21104d7dfd1ce /disk7/cpool/7/1/d/71d4bbd0666acec2d04de2287eb69718 /disk7/cpool/7/1/d/71dc954d06f27900067bf03cbcaf9aa5 /disk7/cpool/7/1/e/71eff07b6a710542eb87dee4bfb42518 /disk7/cpool/7/1/e/71e8762bf186ff2fcdbb5d6c8a8d0e9a /disk7/cpool/7/1/e/71e0540f5a8743d7f14c83a735ae7b84 /disk7/cpool/7/1/e/71ed6386fbc57678d495231a1e95f545 /disk7/cpool/7/1/e/71ecac52e6c5e2876758c217adb203dd /disk7/cpool/7/1/f/71fb2ff8be0458120a7121aa9b2b95a0 /disk7/cpool/7/1/f/71f8897673228fba7c3ec7ac47762a1d /disk7/cpool/7/1/f/71f8887e4a11339ad339e72c08917443 /disk7/cpool/7/2/0/720e9b592c8a884e920a6959aa12379d /disk7/cpool/7/2/0/720d08900eb3cbbb3b4bd3ed1bc7ea54 /disk7/cpool/7/2/0/7204af8bcd48cfde20d8a3a437d4f4cb /disk7/cpool/7/2/1/721422eba727f05f7597cd738405471a /disk7/cpool/7/2/1/721685a104e6d0ed700999fafb1811c1 /disk7/cpool/7/2/1/721d7ea31586dbc067ec90063c04439f /disk7/cpool/7/2/1/721b604b40c283b4334dcd7e972a0a77 /disk7/cpool/7/2/1/72134fd94c7e1f360dedc14396b41224 /disk7/cpool/7/2/1/721c1020248216fef74fd7845ce3b413 /disk7/cpool/7/2/1/7212df5e375c04374ab87470ae362cf0 /disk7/cpool/7/2/1/721e2d42546a759488af669cda3b5d9f /disk7/cpool/7/2/2/72213455b3b2600c661d636bc6b4b177 /disk7/cpool/7/2/2/72246cf454e23d7d60b29d8f49aee236 /disk7/cpool/7/2/2/722516b897f40c3d60e556c548019885 /disk7/cpool/7/2/2/722f1411ef51de224df2597a4d31aa35 /disk7/cpool/7/2/2/72209241cdcc591590be89f24170f00b /disk7/cpool/7/2/2/722aaeb3c7b6ab6aeafa81874de71595 /disk7/cpool/7/2/2/722177d19b1ad671d9b2ea0f08b4c281 /disk7/cpool/7/2/3/72323111869966d37d908412a40fcc84 /disk7/cpool/7/2/3/72353832217c2ac6a69d952e9260a25b /disk7/cpool/7/2/3/7236472fcf47121c8b85d3a0aa7e7c0b /disk7/cpool/7/2/3/7234bdd8530b2ce81feb2dff268fbca4 /disk7/cpool/7/2/3/723b14d69303fae3dce6b276d6798d45 /disk7/cpool/7/2/3/7239098c7e583fc127754b34de6c2d65 [backuppc@backup backuppc]$ It's interesting because there are heaps of 1s up to about half way. Then there are none? I randomly checked various directories and sure enough all the 1s are in the low numbers. Seems to be thousands of them. The reiserfsck says everything is OK. Regs Phill. -----Original Message----- From: Craig Barratt To: Phill Bertolus Cc: bac...@li... Sent: 10/31/2002 5:23 PM Subject: Re: [BackupPC-users] Caught ya: BackupPC_nightly is the culprit! > 4) Started BackupPC and ran BackupPC_nightly manually - Machine crashed and > burned about 30 minutes into whatever it was doing! A user-level process should never cause the machine to hang or reboot. It sounds like your file system is corrupted. > 5) Checked logs - logs show BackupPC_nightly always starts but never reports > ending (should it?) Yes, it will report finishing if it gets that far. > 6) 1 machine log reports a link was deleted because there is no older file > (words to that effect - if I remember correctly) - may have nothing to do > with anything. This is most likely unrelated. > What does BackupPC_nightly do? (I'll now RTFM 'cause the answer is probably > there to be had). It spends most of its time cleaning the pool. This involves scanning the entire pool and cpool removing files with only one link. What happens when you run a command like: find /data/BackupPC/cpool -type f -links 1 -print Does it complete, or does you system crash? I'd recommend running fsck on the file system. > Is there a log of progress for BackupPC_nightly? No, it doesn't say much about what it is doing. > Can I stop BackupPC_nightly temporarily while maintaining backup integrity > until a problem resolution can be had? You could disable BackupPC_nightly, but then the nightly emails will stop, per-PC log files won't get aged (monthly) and the pool will never get smaller. For a few days this is probably ok. > Is the integrity of my backup OK? How can I tell? BackupPCs should be ok, but I would recommend you find out and fix whatever is going wrong with your system. Craig |
From: Craig B. <cr...@at...> - 2002-10-31 18:13:25
|
> OK here's some output from this command: > > find /disk7/cpool -type f -links 1 -print > [snip] > /disk7/cpool/7/2/3/7239098c7e583fc127754b34de6c2d65 > [backuppc@backup backuppc]$ > > It's interesting because there are heaps of 1s up to about half way. Then > there are none? I randomly checked various directories and sure enough all > the 1s are in the low numbers. Seems to be thousands of them. This makes sense. BackupPC_nightly removes files with only 1 link (they are no longer used). For some reason your system crashes part way through, so a big part of the pool never gets cleaned. I still suspect file system corruption, probably trigged by trying to remove one of these files. Next up, you should add a print to BackupPC_nightly to print each directory and file remove attempt as it traverses the pool. I've attached a diff (just add the 3 lines with a "+") and run it manually. Craig --- BackupPC_nightly 2002-08-03 10:26:43.000000000 -0700 +++ BackupPC_nightly_debug 2002-10-31 10:11:41.000000000 -0800 @@ -124,6 +124,7 @@ # contiguous) my %FixList; # list of paths that need to be renamed to avoid # new holes +$| = 1; for my $pool ( qw(pool cpool) ) { $fileCnt = 0; $dirCnt = 0; @@ -192,6 +193,7 @@ return if ( !-d && !-f ); $dirCnt += -d; + print("Doing directory $name\n") if ( -d ); $name = $1 if ( $name =~ /(.*)/ ); @s = stat($name); if ( $name =~ /(.*)_(\d+)$/ ) { @@ -204,6 +206,7 @@ if ( -f && $s[3] == 1 ) { $blkCntRm += $s[12]; $fileCntRm++; + print("About to remove $name\n"); unlink($name); # # We must keep repeated files numbered sequential (ie: files |
From: Phill B. <Phill@WebWombat.com> - 2002-10-31 22:00:25
|
I think the file system is in all sort of trouble. Could be time to abandon Reiser. I only went for it because I have some 10G files, however these aren't backed up through BackupPC anyway. Doing directory /disk7/cpool/7/2/b Doing directory /disk7/cpool/7/2/a Doing directory /disk7/cpool/7/2/9 Doing directory /disk7/cpool/7/2/8 Doing directory /disk7/cpool/7/2/7 Doing directory /disk7/cpool/7/2/6 Doing directory /disk7/cpool/7/2/5 Doing directory /disk7/cpool/7/2/4 Doing directory /disk7/cpool/7/2/3 About to remove /disk7/cpool/7/2/3/72323111869966d37d908412a40fcc84 About to remove /disk7/cpool/7/2/3/72353832217c2ac6a69d952e9260a25b About to remove /disk7/cpool/7/2/3/7236472fcf47121c8b85d3a0aa7e7c0b About to remove /disk7/cpool/7/2/3/7234bdd8530b2ce81feb2dff268fbca4 About to remove /disk7/cpool/7/2/3/723b14d69303fae3dce6b276d6798d45 About to remove /disk7/cpool/7/2/3/7239098c7e583fc127754b34de6c2d65 [Dead] Phill. -----Original Message----- From: Craig Barratt To: Phill Bertolus Cc: 'bac...@li... ' Sent: 11/1/2002 5:12 AM Subject: Re: [BackupPC-users] Caught ya: BackupPC_nightly is the culprit! > OK here's some output from this command: > > find /disk7/cpool -type f -links 1 -print > [snip] > /disk7/cpool/7/2/3/7239098c7e583fc127754b34de6c2d65 > [backuppc@backup backuppc]$ > > It's interesting because there are heaps of 1s up to about half way. Then > there are none? I randomly checked various directories and sure enough all > the 1s are in the low numbers. Seems to be thousands of them. This makes sense. BackupPC_nightly removes files with only 1 link (they are no longer used). For some reason your system crashes part way through, so a big part of the pool never gets cleaned. I still suspect file system corruption, probably trigged by trying to remove one of these files. Next up, you should add a print to BackupPC_nightly to print each directory and file remove attempt as it traverses the pool. I've attached a diff (just add the 3 lines with a "+") and run it manually. Craig --- BackupPC_nightly 2002-08-03 10:26:43.000000000 -0700 +++ BackupPC_nightly_debug 2002-10-31 10:11:41.000000000 -0800 @@ -124,6 +124,7 @@ # contiguous) my %FixList; # list of paths that need to be renamed to avoid # new holes +$| = 1; for my $pool ( qw(pool cpool) ) { $fileCnt = 0; $dirCnt = 0; @@ -192,6 +193,7 @@ return if ( !-d && !-f ); $dirCnt += -d; + print("Doing directory $name\n") if ( -d ); $name = $1 if ( $name =~ /(.*)/ ); @s = stat($name); if ( $name =~ /(.*)_(\d+)$/ ) { @@ -204,6 +206,7 @@ if ( -f && $s[3] == 1 ) { $blkCntRm += $s[12]; $fileCntRm++; + print("About to remove $name\n"); unlink($name); # # We must keep repeated files numbered sequential (ie: files |
From: Craig B. <cr...@at...> - 2002-10-31 23:16:27
|
> My question is: Does this post state that the maximum hard links to a given > file is 64535 or that the total number of all hard links on a file system is > 64535? It states the max number of hardlinks per file is 64535. On most linux systems the limit is 32000. It is very unlikely that BackupPC is running into this limit. This would happen only if the same non-empty file appears in more than 32000 (or 64535 places) among all the hosts and backups. In the next version I will have BackupPC report the maximum number of links, and also have it gracefully go over the limit by simply creating a new file to link against. Craig |
From: Craig B. <cr...@at...> - 2002-10-31 23:49:41
|
> I think the file system is in all sort of trouble. Could be time to abandon > Reiser. I only went for it because I have some 10G files, however these > aren't backed up through BackupPC anyway. Yes, your file system is definitely sick. > Doing directory /disk7/cpool/7/2/b > Doing directory /disk7/cpool/7/2/a > Doing directory /disk7/cpool/7/2/9 > Doing directory /disk7/cpool/7/2/8 > Doing directory /disk7/cpool/7/2/7 > Doing directory /disk7/cpool/7/2/6 > Doing directory /disk7/cpool/7/2/5 > Doing directory /disk7/cpool/7/2/4 > Doing directory /disk7/cpool/7/2/3 > About to remove /disk7/cpool/7/2/3/72323111869966d37d908412a40fcc84 > About to remove /disk7/cpool/7/2/3/72353832217c2ac6a69d952e9260a25b > About to remove /disk7/cpool/7/2/3/7236472fcf47121c8b85d3a0aa7e7c0b > About to remove /disk7/cpool/7/2/3/7234bdd8530b2ce81feb2dff268fbca4 > About to remove /disk7/cpool/7/2/3/723b14d69303fae3dce6b276d6798d45 > About to remove /disk7/cpool/7/2/3/7239098c7e583fc127754b34de6c2d65 > > [Dead] When it comes back up you could try this: find /disk7/cpool/7 -type f -links 1 -print Then try manually removing some of the files it prints. I would guess the system will shortly crash. Craig |
From: Phill B. <Phill@WebWombat.com> - 2002-11-01 05:57:36
|
Your guess is correct. Conclusion is ReiserFS disk is fried. I've now abandoned ReiserFS. The issue may yet be the PromiseRAID controller though. However this next step may shed more light on the issue depending on if it goes reliable. What I've done: 1) I've taken one of the RAID0 disks and sent it off site as is. I'm assuming my backups could be OK on it. 2) Swapped in another offsite drive in its place. 3) kill [BackupPC] 4) umount /dev/md0 5) mkfs -t ext3 /dev/md0 6) mount /dev/md0 /disk7 7) Copy all the configs back into /disk7/conf and /disk7/pc/.../conf.pl 8) ./run_BackupPC (my shell script to kick things off with all the right RSA keys etc.) 9) raidhotadd /dev/md0 /dev/hdg1 (the replacement disk). 10) Via web interface Full Backup times 12. 8 running and 4 queued Things are pretty busy now (lots of 0.0% idle time - load average of 10.4)... all working OK and /proc/mdstat says it's getting through the resync gradually. The system is actually still quite responsive too. Amazing these new multiGigaHertz CPUs, and they're so cheap too. Thanks for your help Craig, most appreciated. Phill. > -----Original Message----- > From: Craig Barratt [mailto:cr...@at...] > Sent: Friday, 1 November 2002 10:49 AM > To: Phill Bertolus > Cc: bac...@li... > Subject: Re: [BackupPC-users] Caught ya: BackupPC_nightly is the > culprit! > > > > I think the file system is in all sort of trouble. Could be > time to abandon > > Reiser. I only went for it because I have some 10G files, > however these > > aren't backed up through BackupPC anyway. > > Yes, your file system is definitely sick. > > > Doing directory /disk7/cpool/7/2/b > > Doing directory /disk7/cpool/7/2/a > > Doing directory /disk7/cpool/7/2/9 > > Doing directory /disk7/cpool/7/2/8 > > Doing directory /disk7/cpool/7/2/7 > > Doing directory /disk7/cpool/7/2/6 > > Doing directory /disk7/cpool/7/2/5 > > Doing directory /disk7/cpool/7/2/4 > > Doing directory /disk7/cpool/7/2/3 > > About to remove /disk7/cpool/7/2/3/72323111869966d37d908412a40fcc84 > > About to remove /disk7/cpool/7/2/3/72353832217c2ac6a69d952e9260a25b > > About to remove /disk7/cpool/7/2/3/7236472fcf47121c8b85d3a0aa7e7c0b > > About to remove /disk7/cpool/7/2/3/7234bdd8530b2ce81feb2dff268fbca4 > > About to remove /disk7/cpool/7/2/3/723b14d69303fae3dce6b276d6798d45 > > About to remove /disk7/cpool/7/2/3/7239098c7e583fc127754b34de6c2d65 > > > > [Dead] > > When it comes back up you could try this: > > find /disk7/cpool/7 -type f -links 1 -print > > Then try manually removing some of the files it prints. I would guess > the system will shortly crash. > > Craig > |
From: Phill B. <Phill@WebWombat.com> - 2002-11-05 07:55:43
|
FYI 3 days has gone past without a hiccup on ext3 and RAID1, RH73 (not RAID0 as noted below). It's done a couple of incrementals too. Could be ReiserFS, doesn't seem to be PromiseRAID or the hardware. Regs Phill. -----Original Message----- From: Phill Bertolus To: 'Craig Barratt' Cc: 'bac...@li...' Sent: 11/1/2002 5:05 PM Subject: RE: [BackupPC-users] Caught ya: BackupPC_nightly is the culprit! Your guess is correct. Conclusion is ReiserFS disk is fried. I've now abandoned ReiserFS. The issue may yet be the PromiseRAID controller though. However this next step may shed more light on the issue depending on if it goes reliable. What I've done: 1) I've taken one of the RAID0 disks and sent it off site as is. I'm assuming my backups could be OK on it. 2) Swapped in another offsite drive in its place. 3) kill [BackupPC] 4) umount /dev/md0 5) mkfs -t ext3 /dev/md0 6) mount /dev/md0 /disk7 7) Copy all the configs back into /disk7/conf and /disk7/pc/.../conf.pl 8) ./run_BackupPC (my shell script to kick things off with all the right RSA keys etc.) 9) raidhotadd /dev/md0 /dev/hdg1 (the replacement disk). 10) Via web interface Full Backup times 12. 8 running and 4 queued Things are pretty busy now (lots of 0.0% idle time - load average of 10.4)... all working OK and /proc/mdstat says it's getting through the resync gradually. The system is actually still quite responsive too. Amazing these new multiGigaHertz CPUs, and they're so cheap too. Thanks for your help Craig, most appreciated. Phill. > -----Original Message----- > From: Craig Barratt [mailto:cr...@at...] > Sent: Friday, 1 November 2002 10:49 AM > To: Phill Bertolus > Cc: bac...@li... > Subject: Re: [BackupPC-users] Caught ya: BackupPC_nightly is the > culprit! > > > > I think the file system is in all sort of trouble. Could be > time to abandon > > Reiser. I only went for it because I have some 10G files, > however these > > aren't backed up through BackupPC anyway. > > Yes, your file system is definitely sick. > > > Doing directory /disk7/cpool/7/2/b > > Doing directory /disk7/cpool/7/2/a > > Doing directory /disk7/cpool/7/2/9 > > Doing directory /disk7/cpool/7/2/8 > > Doing directory /disk7/cpool/7/2/7 > > Doing directory /disk7/cpool/7/2/6 > > Doing directory /disk7/cpool/7/2/5 > > Doing directory /disk7/cpool/7/2/4 > > Doing directory /disk7/cpool/7/2/3 > > About to remove /disk7/cpool/7/2/3/72323111869966d37d908412a40fcc84 > > About to remove /disk7/cpool/7/2/3/72353832217c2ac6a69d952e9260a25b > > About to remove /disk7/cpool/7/2/3/7236472fcf47121c8b85d3a0aa7e7c0b > > About to remove /disk7/cpool/7/2/3/7234bdd8530b2ce81feb2dff268fbca4 > > About to remove /disk7/cpool/7/2/3/723b14d69303fae3dce6b276d6798d45 > > About to remove /disk7/cpool/7/2/3/7239098c7e583fc127754b34de6c2d65 > > > > [Dead] > > When it comes back up you could try this: > > find /disk7/cpool/7 -type f -links 1 -print > > Then try manually removing some of the files it prints. I would guess > the system will shortly crash. > > Craig > ------------------------------------------------------- This sf.net email is sponsored by: See the NEW Palm Tungsten T handheld. Power & Color in a compact size! http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en _______________________________________________ BackupPC-users mailing list Bac...@li... https://lists.sourceforge.net/lists/listinfo/backuppc-users |