From: Steven W. O. <st...@sy...> - 2006-11-22 14:53:05
|
I'm running 1.2.1 and on an occasional basis I end up with a gzip process that does not get reaped. It's not a zombie process, it's just hanging waiting for it's parent to reap it. Here's what ps shows: root 23277 0.0 0.0 1784 592 ? S 03:31 0:03 gzip -9 Do we need to modify flexbackup to set SIG_IGN for SIGCHLD? Anybody home? TIA -- Time flies like the wind. Fruit flies like a banana. Stranger things have .0. happened but none stranger than this. Does your driver's license say Organ ..0 Donor?Black holes are where God divided by zero. Listen to me! We are all- 000 individuals! What if this weren't a hypothetical question? steveo at syslang.net |
From: Charlie B. <cha...@e-...> - 2006-11-22 15:01:46
|
On Wed, 22 Nov 2006, Steven W. Orr wrote: > I'm running 1.2.1 and on an occasional basis I end up with a gzip process > that does not get reaped. It's not a zombie process, it's just hanging > waiting for it's parent to reap it. Your statement doesn't make sense. A zombie is a process which has exited. The process's remains are kept by the kernel waiting for the parent to reap. Running processes (which have not exited) do not wait for their parents to reap them. A process is either still running, or it is a process which is a zombie and has not been reaped. There's no such thing as a dead process which has not been reaped which is not a zombie > Here's what ps shows: > > root 23277 0.0 0.0 1784 592 ? S 03:31 0:03 gzip -9 Whatever is feeding the standard input of that process has not terminated. What does "ps fax" tell you? > Do we need to modify flexbackup to set SIG_IGN for SIGCHLD? I don't know why you are suggesting that. > Anybody home? Lights are on. |
From: Steven W. O. <st...@sy...> - 2006-11-22 15:34:42
|
On Wednesday, Nov 22nd 2006 at 10:01 -0500, quoth Charlie Brady: => =>On Wed, 22 Nov 2006, Steven W. Orr wrote: => =>> I'm running 1.2.1 and on an occasional basis I end up with a gzip process =>> that does not get reaped. It's not a zombie process, it's just hanging =>> waiting for it's parent to reap it. => =>Your statement doesn't make sense. => =>A zombie is a process which has exited. The process's remains are kept by =>the kernel waiting for the parent to reap. Running processes (which have =>not exited) do not wait for their parents to reap them. => =>A process is either still running, or it is a process which is a zombie =>and has not been reaped. There's no such thing as a dead process which has =>not been reaped which is not a zombie => =>> Here's what ps shows: =>> =>> root 23277 0.0 0.0 1784 592 ? S 03:31 0:03 gzip -9 => =>Whatever is feeding the standard input of that process has not terminated. =>What does "ps fax" tell you? => =>> Do we need to modify flexbackup to set SIG_IGN for SIGCHLD? => =>I don't know why you are suggesting that. Right. It's not a zombie like I said above, but since it's not, you're correct that the issue of SIG_IGN for SIGCHLD would be a red herring. From the ps output above, it's in a sleep state. Your question about who the parent is is good. I don't remember because I just killed the process after I sent this message but I believe (from previous incidents) it is the child of flexbackup. So the tree should be cron \_bash \_flexbackup \_gzip What I think is happening is that flexbackup is waiting for gzip to complete before it exits. But gzip doesn't exit because it's waiting for more input, not knowing that more isn't coming. Sometimes I can go a month without a hangup, and sometimes it hangs multiple times per week. Do we need to wait for a reoccurance or is this enough to be able to work with? -- Time flies like the wind. Fruit flies like a banana. Stranger things have .0. happened but none stranger than this. Does your driver's license say Organ ..0 Donor?Black holes are where God divided by zero. Listen to me! We are all- 000 individuals! What if this weren't a hypothetical question? steveo at syslang.net |
From: Charlie B. <cha...@e-...> - 2006-11-22 15:51:52
|
On Wed, 22 Nov 2006, Steven W. Orr wrote: > =>Whatever is feeding the standard input of that process has not terminated. > =>What does "ps fax" tell you? > => > =>> Do we need to modify flexbackup to set SIG_IGN for SIGCHLD? > => > =>I don't know why you are suggesting that. > > Right. It's not a zombie like I said above, but since it's not, you're > correct that the issue of SIG_IGN for SIGCHLD would be a red herring. From > the ps output above, it's in a sleep state. Your question about who the > parent is is good. I don't remember because I just killed the process > after I sent this message but I believe (from previous incidents) it is > the child of flexbackup. So the tree should be > > cron > \_bash > \_flexbackup > \_gzip No, the tree should never be just that. Something should be feeding gzip, and gzip should be feeding something. Both "somethings" should be children of flexbackup. The exact identity of the "somethings" will depend on your configuration. > What I think is happening is that flexbackup is waiting for gzip to > complete before it exits. But gzip doesn't exit because it's waiting for > more input, not knowing that more isn't coming. Yes, and you need to determine why no more input is coming, and yet the program providing such input to gzip has not exited. > Sometimes I can go a month without a hangup, and sometimes it hangs > multiple times per week. Do we need to wait for a reoccurance or is this > enough to be able to work with? It's not enough because you haven't given us the full information. Since you've killed the gzip process, we can't determine what was feeding it input and why it was blocked. If you can show the actual process tree rather than what you think "should" be there, then we can provide more debugging instructions. Perhaps if you describe your configuration someone can speculate about what process was blocked and why. -- Charlie |
From: Steven W. O. <st...@sy...> - 2006-11-25 03:33:34
|
On Wednesday, Nov 22nd 2006 at 10:51 -0500, quoth Charlie Brady: => =>On Wed, 22 Nov 2006, Steven W. Orr wrote: => =>> =>Whatever is feeding the standard input of that process has not terminated. =>> =>What does "ps fax" tell you? =>> => =>> =>> Do we need to modify flexbackup to set SIG_IGN for SIGCHLD? =>> => =>> =>I don't know why you are suggesting that. =>> =>> Right. It's not a zombie like I said above, but since it's not, you're =>> correct that the issue of SIG_IGN for SIGCHLD would be a red herring. From =>> the ps output above, it's in a sleep state. Your question about who the =>> parent is is good. I don't remember because I just killed the process =>> after I sent this message but I believe (from previous incidents) it is =>> the child of flexbackup. So the tree should be =>> =>> cron =>> \_bash =>> \_flexbackup =>> \_gzip => =>No, the tree should never be just that. Something should be feeding gzip, =>and gzip should be feeding something. Both "somethings" should be children =>of flexbackup. The exact identity of the "somethings" will depend on your =>configuration. => =>> What I think is happening is that flexbackup is waiting for gzip to =>> complete before it exits. But gzip doesn't exit because it's waiting for =>> more input, not knowing that more isn't coming. => =>Yes, and you need to determine why no more input is coming, and yet the =>program providing such input to gzip has not exited. => =>> Sometimes I can go a month without a hangup, and sometimes it hangs =>> multiple times per week. Do we need to wait for a reoccurance or is this =>> enough to be able to work with? => =>It's not enough because you haven't given us the full information. Since =>you've killed the gzip process, we can't determine what was feeding it =>input and why it was blocked. If you can show the actual process tree =>rather than what you think "should" be there, then we can provide more =>debugging instructions. => =>Perhaps if you describe your configuration someone can speculate about =>what process was blocked and why. Ok. I got a new one today and I'm leaving it around so we can figure this thing out. Here's the cron tree: 3480 ? Ss 0:01 crond 16571 ? S 0:00 \_ crond 16572 ? Ss 0:00 \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -incremental 16846 ? Z 0:00 | \_ [sh] <defunct> 16573 ? S 0:00 \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t And here's the gzip: 16860 ? S 0:01 gzip -9 and ps -ef shows root 16860 1 0 03:31 ? 00:00:01 gzip -9 root 16571 3480 0 03:31 ? 00:00:00 crond root 16572 16571 0 03:31 ? 00:00:00 /usr/bin/perl -w /usr/bin/flexbackup -set backup -incremental smmsp 16573 16571 0 03:31 ? 00:00:00 /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t root 16846 16572 0 03:31 ? 00:00:00 [sh] <defunct> which shows that gzip is now the child of init which means that his parent exited and orphaned him. And 16846 seems to not be getting cleanup up by flexbackup. Anyone have an idea of what this all means? -- Time flies like the wind. Fruit flies like a banana. Stranger things have .0. happened but none stranger than this. Does your driver's license say Organ ..0 Donor?Black holes are where God divided by zero. Listen to me! We are all- 000 individuals! What if this weren't a hypothetical question? steveo at syslang.net |
From: Steven W. O. <st...@sy...> - 2006-11-25 15:29:19
|
On Friday, Nov 24th 2006 at 22:33 -0500, quoth Steven W. Orr: =>On Wednesday, Nov 22nd 2006 at 10:51 -0500, quoth Charlie Brady: => =>=> =>=>On Wed, 22 Nov 2006, Steven W. Orr wrote: =>=> =>=>> =>Whatever is feeding the standard input of that process has not terminated. =>=>> =>What does "ps fax" tell you? =>=>> => =>=>> =>> Do we need to modify flexbackup to set SIG_IGN for SIGCHLD? =>=>> => =>=>> =>I don't know why you are suggesting that. =>=>> =>=>> Right. It's not a zombie like I said above, but since it's not, you're =>=>> correct that the issue of SIG_IGN for SIGCHLD would be a red herring. From =>=>> the ps output above, it's in a sleep state. Your question about who the =>=>> parent is is good. I don't remember because I just killed the process =>=>> after I sent this message but I believe (from previous incidents) it is =>=>> the child of flexbackup. So the tree should be =>=>> =>=>> cron =>=>> \_bash =>=>> \_flexbackup =>=>> \_gzip =>=> =>=>No, the tree should never be just that. Something should be feeding gzip, =>=>and gzip should be feeding something. Both "somethings" should be children =>=>of flexbackup. The exact identity of the "somethings" will depend on your =>=>configuration. =>=> =>=>> What I think is happening is that flexbackup is waiting for gzip to =>=>> complete before it exits. But gzip doesn't exit because it's waiting for =>=>> more input, not knowing that more isn't coming. =>=> =>=>Yes, and you need to determine why no more input is coming, and yet the =>=>program providing such input to gzip has not exited. =>=> =>=>> Sometimes I can go a month without a hangup, and sometimes it hangs =>=>> multiple times per week. Do we need to wait for a reoccurance or is this =>=>> enough to be able to work with? =>=> =>=>It's not enough because you haven't given us the full information. Since =>=>you've killed the gzip process, we can't determine what was feeding it =>=>input and why it was blocked. If you can show the actual process tree =>=>rather than what you think "should" be there, then we can provide more =>=>debugging instructions. =>=> =>=>Perhaps if you describe your configuration someone can speculate about =>=>what process was blocked and why. => =>Ok. I got a new one today and I'm leaving it around so we can figure this =>thing out. => =>Here's the cron tree: => => 3480 ? Ss 0:01 crond =>16571 ? S 0:00 \_ crond =>16572 ? Ss 0:00 \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -incremental =>16846 ? Z 0:00 | \_ [sh] <defunct> =>16573 ? S 0:00 \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t => =>And here's the gzip: => =>16860 ? S 0:01 gzip -9 => =>and ps -ef shows => =>root 16860 1 0 03:31 ? 00:00:01 gzip -9 =>root 16571 3480 0 03:31 ? 00:00:00 crond =>root 16572 16571 0 03:31 ? 00:00:00 /usr/bin/perl -w /usr/bin/flexbackup -set backup -incremental =>smmsp 16573 16571 0 03:31 ? 00:00:00 /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t =>root 16846 16572 0 03:31 ? 00:00:00 [sh] <defunct> => =>which shows that gzip is now the child of init which means that his parent =>exited and orphaned him. And 16846 seems to not be getting cleanup up by =>flexbackup. => =>Anyone have an idea of what this all means? Next day and we got lucky. It happened again root 3480 0.0 0.0 2668 468 ? Ss Aug27 0:01 crond root 16571 0.0 0.0 3292 988 ? S Nov24 0:00 \_ crond root 16572 0.0 0.5 8320 5960 ? Ss Nov24 0:00 | \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -incremental root 16846 0.0 0.0 0 0 ? Z Nov24 0:00 | | \_ [sh] <defunct> smmsp 16573 0.0 0.2 7344 2744 ? S Nov24 0:00 | \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t root 21193 0.0 0.0 3292 988 ? S 03:31 0:00 \_ crond root 21194 0.0 0.5 8320 5952 ? Ss 03:31 0:00 \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -differential root 21377 0.0 0.0 0 0 ? Z 03:31 0:00 | \_ [sh] <defunct> smmsp 21195 0.0 0.2 7344 2728 ? S 03:31 0:00 \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t and we now have two gzips owned by init. 526 > ps -ef | grep gzip root 16860 1 0 Nov24 ? 00:00:01 gzip -9 root 21419 1 0 03:31 ? 00:00:03 gzip -9 steveo 5813 9890 0 10:28 pts/3 00:00:00 grep gzip 527 > -- Time flies like the wind. Fruit flies like a banana. Stranger things have .0. happened but none stranger than this. Does your driver's license say Organ ..0 Donor?Black holes are where God divided by zero. Listen to me! We are all- 000 individuals! What if this weren't a hypothetical question? steveo at syslang.net |
From: Steven W. O. <st...@sy...> - 2006-11-27 15:21:13
|
On Saturday, Nov 25th 2006 at 10:29 -0500, quoth Steven W. Orr: =>On Friday, Nov 24th 2006 at 22:33 -0500, quoth Steven W. Orr: => =>=>On Wednesday, Nov 22nd 2006 at 10:51 -0500, quoth Charlie Brady: =>=> =>=>=> =>=>=>On Wed, 22 Nov 2006, Steven W. Orr wrote: =>=>=> =>=>=>> =>Whatever is feeding the standard input of that process has not terminated. =>=>=>> =>What does "ps fax" tell you? =>=>=>> => =>=>=>> =>> Do we need to modify flexbackup to set SIG_IGN for SIGCHLD? =>=>=>> => =>=>=>> =>I don't know why you are suggesting that. =>=>=>> =>=>=>> Right. It's not a zombie like I said above, but since it's not, you're =>=>=>> correct that the issue of SIG_IGN for SIGCHLD would be a red herring. From =>=>=>> the ps output above, it's in a sleep state. Your question about who the =>=>=>> parent is is good. I don't remember because I just killed the process =>=>=>> after I sent this message but I believe (from previous incidents) it is =>=>=>> the child of flexbackup. So the tree should be =>=>=>> =>=>=>> cron =>=>=>> \_bash =>=>=>> \_flexbackup =>=>=>> \_gzip =>=>=> =>=>=>No, the tree should never be just that. Something should be feeding gzip, =>=>=>and gzip should be feeding something. Both "somethings" should be children =>=>=>of flexbackup. The exact identity of the "somethings" will depend on your =>=>=>configuration. =>=>=> =>=>=>> What I think is happening is that flexbackup is waiting for gzip to =>=>=>> complete before it exits. But gzip doesn't exit because it's waiting for =>=>=>> more input, not knowing that more isn't coming. =>=>=> =>=>=>Yes, and you need to determine why no more input is coming, and yet the =>=>=>program providing such input to gzip has not exited. =>=>=> =>=>=>> Sometimes I can go a month without a hangup, and sometimes it hangs =>=>=>> multiple times per week. Do we need to wait for a reoccurance or is this =>=>=>> enough to be able to work with? =>=>=> =>=>=>It's not enough because you haven't given us the full information. Since =>=>=>you've killed the gzip process, we can't determine what was feeding it =>=>=>input and why it was blocked. If you can show the actual process tree =>=>=>rather than what you think "should" be there, then we can provide more =>=>=>debugging instructions. =>=>=> =>=>=>Perhaps if you describe your configuration someone can speculate about =>=>=>what process was blocked and why. =>=> =>=>Ok. I got a new one today and I'm leaving it around so we can figure this =>=>thing out. =>=> =>=>Here's the cron tree: =>=> =>=> 3480 ? Ss 0:01 crond =>=>16571 ? S 0:00 \_ crond =>=>16572 ? Ss 0:00 \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -incremental =>=>16846 ? Z 0:00 | \_ [sh] <defunct> =>=>16573 ? S 0:00 \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t =>=> =>=>And here's the gzip: =>=> =>=>16860 ? S 0:01 gzip -9 =>=> =>=>and ps -ef shows =>=> =>=>root 16860 1 0 03:31 ? 00:00:01 gzip -9 =>=>root 16571 3480 0 03:31 ? 00:00:00 crond =>=>root 16572 16571 0 03:31 ? 00:00:00 /usr/bin/perl -w /usr/bin/flexbackup -set backup -incremental =>=>smmsp 16573 16571 0 03:31 ? 00:00:00 /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t =>=>root 16846 16572 0 03:31 ? 00:00:00 [sh] <defunct> =>=> =>=>which shows that gzip is now the child of init which means that his parent =>=>exited and orphaned him. And 16846 seems to not be getting cleanup up by =>=>flexbackup. =>=> =>=>Anyone have an idea of what this all means? => =>Next day and we got lucky. It happened again => => =>root 3480 0.0 0.0 2668 468 ? Ss Aug27 0:01 crond =>root 16571 0.0 0.0 3292 988 ? S Nov24 0:00 \_ crond =>root 16572 0.0 0.5 8320 5960 ? Ss Nov24 0:00 | \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -incremental =>root 16846 0.0 0.0 0 0 ? Z Nov24 0:00 | | \_ [sh] <defunct> =>smmsp 16573 0.0 0.2 7344 2744 ? S Nov24 0:00 | \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t =>root 21193 0.0 0.0 3292 988 ? S 03:31 0:00 \_ crond =>root 21194 0.0 0.5 8320 5952 ? Ss 03:31 0:00 \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -differential =>root 21377 0.0 0.0 0 0 ? Z 03:31 0:00 | \_ [sh] <defunct> =>smmsp 21195 0.0 0.2 7344 2728 ? S 03:31 0:00 \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t => =>and we now have two gzips owned by init. => =>526 > ps -ef | grep gzip =>root 16860 1 0 Nov24 ? 00:00:01 gzip -9 =>root 21419 1 0 03:31 ? 00:00:03 gzip -9 =>steveo 5813 9890 0 10:28 pts/3 00:00:00 grep gzip =>527 > Ok. I promise I won't post any more examples. I just had one more from last night. 3480 ? Ss 0:01 crond 16571 ? S 0:00 \_ crond 16572 ? Ss 0:00 | \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -incremental 16846 ? Z 0:00 | | \_ [sh] <defunct> 16573 ? S 0:00 | \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t 21193 ? S 0:00 \_ crond 21194 ? Ss 0:00 | \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -differential 21377 ? Z 0:00 | | \_ [sh] <defunct> 21195 ? S 0:00 | \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t 17332 ? S 0:00 \_ crond 17333 ? Ss 0:00 \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -incremental 17705 ? Z 0:00 | \_ [sh] <defunct> 17334 ? S 0:00 \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t 524 > ps -ef | grep gzip root 16860 1 0 Nov24 ? 00:00:01 gzip -9 root 21419 1 0 Nov25 ? 00:00:03 gzip -9 root 17719 1 0 03:32 ? 00:00:01 gzip -9 steveo 12378 3292 0 10:17 pts/5 00:00:00 grep gzip 525 > Also, I included my flexbackup.conf, if that helps. Thanks. $type = 'afio'; $set{'backup'} = "/e/web /usr/share/emacs/site-lisp /usr/local /etc /boot /root /var/spool/mail /var/log"; $prune{'/e/web'} = "steveo/mpg"; $compress = 'gzip'; # one of false/gzip/bzip2/lzop/zip/compress/hardware $compr_level = '9'; # compression level (1-9) (for gzip/bzip2/lzop/zip) $buffer = 'buffer'; # one of false/buffer/mbuffer $buffer_megs = '10'; # buffer memory size (in megabytes) $buffer_fill_pct = '75'; # start writing when buffer this percent full $buffer_pause_usec = '100'; # pause after write (tape devices only) $device = '/d2/backup'; $blksize = '10'; $mt_blksize = "0"; $pad_blocks = 'true'; $remoteshell = 'ssh'; # command for remote shell (rsh/ssh/ssh2) $remoteuser = ''; # if non-null, secondary username for remote shells $label = 'true'; # somehow store identifying label in archive? $verbose = 'true'; # echo each file? $sparse = 'true'; # handle sparse files? $indexes = 'true'; # false to turn off all table-of-contents support $staticfiles = 'false'; $atime_preserve = 'false'; $traverse_fs = 'false'; $exclude_expr[0] = '.*/[Cc]ache/.*'; $exclude_expr[1] = '.*~$'; $erase_tape_set_level_zero = 'true'; $erase_rewind_only = 'false'; $logdir = '/var/log/flexbackup'; # directory for log files $comp_log = 'bzip2'; # compress log? false/gzip/bzip2/lzop/compress/zip $staticlogs = 'false'; # static log filenames w/ no date stamp $prefix = ''; # log files will start with this prefix $tmpdir = '/tmp'; # used for temporary refdate files, etc $stampdir = '/var/lib/flexbackup'; # directory for backup timestamps $index = '/var/lib/flexbackup/index'; # DB filename for tape indexes $keyfile = '00-index-key'; # filename for keyfile if archiving to dir $sprefix = ''; # stamp files will start with this prefix $afio_nocompress_types = 'mp3 MP3 Z z gz gif GIF zip ZIP lha jpeg jpg JPG taz tgz deb rpm bz2 lzo png'; $afio_echo_block = 'false'; $afio_compress_threshold = '3'; $afio_compress_cache_size = '2'; $tar_echo_record_num = 'false'; $cpio_format = 'newc'; $dump_length = '0'; $dump_use_dumpdates = 'false'; $star_fifo = 'true'; $star_acl = 'true'; $star_format = 'exustar'; $star_echo_block_num = 'false'; $pax_format = 'ustar'; $zip_nocompress_types = 'mp3 MP3 Z z gz gif zip ZIP lha jpeg jpg JPG taz tgz deb rpm bz2 lzo'; $pkgdelta_archive_list = 'rootonly'; $pkgdelta_archive_unowned = 'true'; $pkgdelta_archive_changed = 'true'; 1; |
From: Steven W. O. <st...@sy...> - 2006-12-31 22:16:27
|
On Monday, Nov 27th 2006 at 10:20 -0500, quoth Steven W. Orr: =>On Saturday, Nov 25th 2006 at 10:29 -0500, quoth Steven W. Orr: => =>=>On Friday, Nov 24th 2006 at 22:33 -0500, quoth Steven W. Orr: =>=> =>=>=> =>=>=>Ok. I got a new one today and I'm leaving it around so we can figure this =>=>=>thing out. =>=>=> =>=>=>Here's the cron tree: =>=>=> =>=>=> 3480 ? Ss 0:01 crond =>=>=>16571 ? S 0:00 \_ crond =>=>=>16572 ? Ss 0:00 \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -incremental =>=>=>16846 ? Z 0:00 | \_ [sh] <defunct> =>=>=>16573 ? S 0:00 \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t =>=>=> =>=>=>And here's the gzip: =>=>=> =>=>=>16860 ? S 0:01 gzip -9 =>=>=> =>=>=>and ps -ef shows =>=>=> =>=>=>root 16860 1 0 03:31 ? 00:00:01 gzip -9 =>=>=>root 16571 3480 0 03:31 ? 00:00:00 crond =>=>=>root 16572 16571 0 03:31 ? 00:00:00 /usr/bin/perl -w /usr/bin/flexbackup -set backup -incremental =>=>=>smmsp 16573 16571 0 03:31 ? 00:00:00 /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t =>=>=>root 16846 16572 0 03:31 ? 00:00:00 [sh] <defunct> =>=>=> =>=>=>which shows that gzip is now the child of init which means that his parent =>=>=>exited and orphaned him. And 16846 seems to not be getting cleanup up by =>=>=>flexbackup. =>=>=> =>=>=>Anyone have an idea of what this all means? =>=> =>=>Next day and we got lucky. It happened again =>=> =>=> =>=>root 3480 0.0 0.0 2668 468 ? Ss Aug27 0:01 crond =>=>root 16571 0.0 0.0 3292 988 ? S Nov24 0:00 \_ crond =>=>root 16572 0.0 0.5 8320 5960 ? Ss Nov24 0:00 | \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -incremental =>=>root 16846 0.0 0.0 0 0 ? Z Nov24 0:00 | | \_ [sh] <defunct> =>=>smmsp 16573 0.0 0.2 7344 2744 ? S Nov24 0:00 | \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t =>=>root 21193 0.0 0.0 3292 988 ? S 03:31 0:00 \_ crond =>=>root 21194 0.0 0.5 8320 5952 ? Ss 03:31 0:00 \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -differential =>=>root 21377 0.0 0.0 0 0 ? Z 03:31 0:00 | \_ [sh] <defunct> =>=>smmsp 21195 0.0 0.2 7344 2728 ? S 03:31 0:00 \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t =>=> =>=>and we now have two gzips owned by init. =>=> =>=>526 > ps -ef | grep gzip =>=>root 16860 1 0 Nov24 ? 00:00:01 gzip -9 =>=>root 21419 1 0 03:31 ? 00:00:03 gzip -9 =>=>steveo 5813 9890 0 10:28 pts/3 00:00:00 grep gzip =>=>527 > => => =>Ok. I promise I won't post any more examples. I just had one more from =>last night. => => 3480 ? Ss 0:01 crond =>16571 ? S 0:00 \_ crond =>16572 ? Ss 0:00 | \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -incremental =>16846 ? Z 0:00 | | \_ [sh] <defunct> =>16573 ? S 0:00 | \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t =>21193 ? S 0:00 \_ crond =>21194 ? Ss 0:00 | \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -differential =>21377 ? Z 0:00 | | \_ [sh] <defunct> =>21195 ? S 0:00 | \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t =>17332 ? S 0:00 \_ crond =>17333 ? Ss 0:00 \_ /usr/bin/perl -w /usr/bin/flexbackup -set backup -incremental =>17705 ? Z 0:00 | \_ [sh] <defunct> =>17334 ? S 0:00 \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t => =>524 > ps -ef | grep gzip =>root 16860 1 0 Nov24 ? 00:00:01 gzip -9 =>root 21419 1 0 Nov25 ? 00:00:03 gzip -9 =>root 17719 1 0 03:32 ? 00:00:01 gzip -9 =>steveo 12378 3292 0 10:17 pts/5 00:00:00 grep gzip =>525 > => =>Also, I included my flexbackup.conf, if that helps. Thanks. => =>$type = 'afio'; =>$set{'backup'} = "/e/web /usr/share/emacs/site-lisp /usr/local /etc /boot =>/root /var/spool/mail /var/log"; =>$prune{'/e/web'} = "steveo/mpg"; =>$compress = 'gzip'; # one of false/gzip/bzip2/lzop/zip/compress/hardware =>$compr_level = '9'; # compression level (1-9) (for gzip/bzip2/lzop/zip) =>$buffer = 'buffer'; # one of false/buffer/mbuffer =>$buffer_megs = '10'; # buffer memory size (in megabytes) =>$buffer_fill_pct = '75'; # start writing when buffer this percent full =>$buffer_pause_usec = '100'; # pause after write (tape devices only) =>$device = '/d2/backup'; =>$blksize = '10'; =>$mt_blksize = "0"; =>$pad_blocks = 'true'; =>$remoteshell = 'ssh'; # command for remote shell (rsh/ssh/ssh2) =>$remoteuser = ''; # if non-null, secondary username for remote shells =>$label = 'true'; # somehow store identifying label in archive? =>$verbose = 'true'; # echo each file? =>$sparse = 'true'; # handle sparse files? =>$indexes = 'true'; # false to turn off all table-of-contents support =>$staticfiles = 'false'; =>$atime_preserve = 'false'; =>$traverse_fs = 'false'; =>$exclude_expr[0] = '.*/[Cc]ache/.*'; =>$exclude_expr[1] = '.*~$'; =>$erase_tape_set_level_zero = 'true'; =>$erase_rewind_only = 'false'; =>$logdir = '/var/log/flexbackup'; # directory for log files =>$comp_log = 'bzip2'; # compress log? false/gzip/bzip2/lzop/compress/zip =>$staticlogs = 'false'; # static log filenames w/ no date stamp =>$prefix = ''; # log files will start with this prefix =>$tmpdir = '/tmp'; # used for temporary refdate files, etc =>$stampdir = '/var/lib/flexbackup'; # directory for backup timestamps =>$index = '/var/lib/flexbackup/index'; # DB filename for tape indexes =>$keyfile = '00-index-key'; # filename for keyfile if archiving to dir =>$sprefix = ''; # stamp files will start with this prefix =>$afio_nocompress_types = 'mp3 MP3 Z z gz gif GIF zip ZIP lha jpeg jpg JPG =>taz tgz deb rpm bz2 lzo png'; =>$afio_echo_block = 'false'; =>$afio_compress_threshold = '3'; =>$afio_compress_cache_size = '2'; =>$tar_echo_record_num = 'false'; =>$cpio_format = 'newc'; =>$dump_length = '0'; =>$dump_use_dumpdates = 'false'; =>$star_fifo = 'true'; =>$star_acl = 'true'; =>$star_format = 'exustar'; =>$star_echo_block_num = 'false'; =>$pax_format = 'ustar'; =>$zip_nocompress_types = 'mp3 MP3 Z z gz gif zip ZIP lha jpeg jpg JPG taz tgz deb rpm bz2 lzo'; =>$pkgdelta_archive_list = 'rootonly'; =>$pkgdelta_archive_unowned = 'true'; =>$pkgdelta_archive_changed = 'true'; =>1; Hi. Sorry to bother people. I have had this problem continue to happen. I had problems getting it fully and properly described on the first couple of shots, but now that it is I never got anyone to respond. There's an article in this month's Linux Journal discussing a backup setup using Duplicity. I'd rather not switch, so if anyone can advise on how to fix this problem I'd appreciate it. I'm thinking a possible race condition on proper child reaping. Can this be fixed? TIA -- Time flies like the wind. Fruit flies like a banana. Stranger things have .0. happened but none stranger than this. Does your driver's license say Organ ..0 Donor?Black holes are where God divided by zero. Listen to me! We are all- 000 individuals! What if this weren't a hypothetical question? steveo at syslang.net |