Re: [Queue-developers] queued and defunct processes
Brought to you by:
wkrebs
From: Mike C. <da...@ix...> - 2001-05-13 07:29:34
|
On Sat, May 12, 2001 at 11:24:32PM -0700, Mike Castle wrote: > I think I may have narrowed it down. > > It *appears* that if queued receives a SIGCHLD while inside netfread, > things aren't properly processed. The fact that a SIGCHLD was received is > either somehow lost, OR things hang out in netfread for a long time. Well, I'm still not sure about the above, however, I *did* observe the following: Queue_nonblocking_rw(), as used by the getloadavg routine, really mucks things up as well. As the following extract from the log shows, the handling of a SIGCHLD can be delayed by at least a minute. (A few extra timestamps were put into place just to be sure). thune.mrc-home.org[8602]: timestamp: "Sun May 13 00:05:01" thune.mrc-home.org[8602]: handle.c chldsigh(): pid 8604: signal 17. thune.mrc-home.org[8602]: exit at Sun May 13 00:05:01 2001 from handle.c:1997 <<< my extra stuff thune.mrc-home.org[7817]: timestamp: "Sun May 13 00:05:01" thune.mrc-home.org[7817]: queued queued.c sigchld(): SIGCHLD. qlib.c Queue_count_the_signal(): signal 17: 14. qlib.c Queue_nonblocking_rw(): select()ing on fd 8 for 61 seconds... thune.mrc-home.org[7817]: timestamp: "Sun May 13 00:06:02" qlib.c Queue_nonblocking_rw(): timed out read()ing 4 bytes from fd 8. qlib.c Queue_nonblocking_rw(): failed to read() 4 bytes (done 0) on fd 8, giving up. qlib.c Queue_net_rw(): failed to get 1 4-byte items on fd 8; got 0 bytes. wakeup.c getrldavg(): failed to fread() from fd 8. wakeup.c getrldavg(): close(8). wakeup.c getrldavg(): ### failed to get load from mars.mrc-home.org ### returning 1.00e+08 as rejection designator. thune.mrc-home.org[7817]: queued queued.c runqueue_b(): queue is running max number of jobs = 2 thune.mrc-home.org[7817]: queued queued.c main() child: SIGCHLD flag set; running waitforchild()... thune.mrc-home.org[7817]: queued queued.c waitforchild(): wait exit pid=8627, stat=00, cpu=1.6 thune.mrc-home.org[7817]: queued queued.c waitforchild(): Sun May 13 00:06:02 2001 Note that PID 8602 exited at 00:05:01, and the SIGCHLD was received by 7817. However, it wasn't even able to be processed for another 61 seconds, (both according to the comment from nonblocking_rw and the time stamp from when waitforchild was finally entered. Still trying to work out WHY mars is in such a state.... mrc -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |