From: G. F. <ga...@fr...> - 2010-02-26 16:27:23
|
Hi all, Unfortunately, I had to add the following daily cron job to have devmon stable Otherwise it goes purple once every 2/3 days on average. -- /etc/cron.daily/devmon --- /etc/init.d/devmon stop sleep 10 /etc/init.d/devmon start I tried to troubleshoot the problem (see http://sourceforge.net/mailarchive/message.php?msg_id=20080630100248.6cdde211.gaetan%40frenoy.net) but finally gave up due to a lack of resource for a test environment. But chances are high that the problem is still present. Have a good day. Gaëtan On 2/26/2010 4:26 PM, Root, Paul wrote: > I'm running on solaris. It is very reliable. > > > Paul Root > Lead Internet Systems Eng > Network Services > > > -----Original Message----- > From: Stewart, Tom L. [mailto:Tom...@la...] > Sent: Friday, February 26, 2010 9:08 AM > To: dev...@li... > Subject: Re: [Devmon] devmon keeps going purple - long post > > I use to see the problem all the time on a Solaris system until I moved > devmon to a RedHat Linux system. Once I moved it, it stayed up for > months and months. > > Tom > > -----Original Message----- > From: Colin Coe [mailto:col...@gm...] > Sent: Thursday, February 25, 2010 6:21 PM > To: dev...@li... > Subject: Re: [Devmon] devmon keeps going purple - long post > > Hi all > > I've been seeing this for a while also. > > It happened again today so rather than just restart I'm going to do > some testing. > > In /var/log/devmon/devmon.log I see > --- > [10-02-26@05:10:59] Starting snmp queries > [10-02-26@05:10:59] Getting device status from hobbit at 127.0.0.1:1984 > [10-02-26@05:11:00] Performing test logic > [10-02-26@05:11:01] Done with test logic > [10-02-26@05:11:01] Sending messages to display server > [10-02-26@05:11:01] Done sending messages > [10-02-26@05:11:01] Sleeping for 58 seconds. > [10-02-26@05:11:59] Starting snmp queries > [10-02-26@05:11:59] Getting device status from hobbit at 127.0.0.1:1984 > [10-02-26@05:12:01] Performing test logic > [10-02-26@05:12:01] Done with test logic > [10-02-26@05:12:01] Sending messages to display server > [10-02-26@05:12:01] Done sending messages > [10-02-26@05:12:01] Sleeping for 58 seconds. > [10-02-26@05:13:00] Starting snmp queries > [10-02-26@05:13:00] Getting device status from hobbit at 127.0.0.1:1984 > --- > > 5:13AM is when devmon last reported in to xymon, currently 7:52AM. > > Using this scriptlet, I've straced the devmon processes. > --- > for I in `ps -ef | awk '/devmon/&& !/awk/ {print $2}'`; do > echo "About to 'strace' PID $I" > echo "-----------" > strace -tfp $I > echo "---------" > done > About to 'strace' PID 24357 > ----------- > Process 24357 attached - interrupt to quit > 07:56:02 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) > 07:56:02 read(11, "", 4096) = 0 > 07:56:02 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) > 07:56:02 read(11, "", 4096) = 0 > -- snip -- > 07:56:05 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) > 07:56:05 read(11, "", 4096) = 0 > 07:56:05 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) > 07:56:05 read(11, "", 4096) = 0 > 07:56:05 select(0, NULL, NULL, NULL, {0, 1000}<unfinished ...> > Process 24357 detached > --------- > About to 'strace' PID 24359 > ----------- > Process 24359 attached - interrupt to quit > 07:56:05 read(7, 0xbfe63f0, 4096) = ? ERESTARTSYS (To be > restarted) > 07:56:15 --- SIGALRM (Alarm clock) @ 0 (0) --- > 07:56:15 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 07:56:15 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 07:56:15 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 07:56:15 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:56:15 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:56:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:56:15 kill(24357, SIG_0) = 0 > 07:56:15 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:56:15 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:56:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:56:15 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:56:15 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:56:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:56:15 alarm(15) = 0 > 07:56:15 read(7, 0xbfe63f0, 4096) = ? ERESTARTSYS (To be > restarted) > 07:56:30 --- SIGALRM (Alarm clock) @ 0 (0) --- > 07:56:30 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 07:56:30 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 07:56:30 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 07:56:30 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:56:30 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:56:30 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:56:30 kill(24357, SIG_0) = 0 > 07:56:30 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:56:30 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:56:30 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:56:30 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:56:30 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:56:30 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:56:30 alarm(15) = 0 > 07:56:30 read(7,<unfinished ...> > Process 24359 detached > --------- > About to 'strace' PID 24360 > ----------- > Process 24360 attached - interrupt to quit > 07:56:33 read(8, 0xbfe6920, 4096) = ? ERESTARTSYS (To be > restarted) > 07:56:45 --- SIGALRM (Alarm clock) @ 0 (0) --- > 07:56:45 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 07:56:45 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 07:56:45 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 07:56:45 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:56:45 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:56:45 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:56:45 kill(24357, SIG_0) = 0 > 07:56:45 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:56:45 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:56:45 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:56:45 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:56:45 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:56:45 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:56:45 alarm(15) = 0 > 07:56:45 read(8, 0xbfe6920, 4096) = ? ERESTARTSYS (To be > restarted) > 07:57:00 --- SIGALRM (Alarm clock) @ 0 (0) --- > 07:57:00 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 07:57:00 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 07:57:00 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 07:57:00 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:57:00 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:57:00 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:57:00 kill(24357, SIG_0) = 0 > 07:57:00 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:57:00 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:57:00 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:57:00 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:57:00 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:57:00 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:57:00 alarm(15) = 0 > 07:57:00 read(8,<unfinished ...> > Process 24360 detached > --------- > About to 'strace' PID 24361 > ----------- > Process 24361 attached - interrupt to quit > 07:57:12 read(9, 0xbfe6e30, 4096) = ? ERESTARTSYS (To be > restarted) > 07:57:15 --- SIGALRM (Alarm clock) @ 0 (0) --- > 07:57:15 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 07:57:15 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 07:57:15 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 07:57:15 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:57:15 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:57:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:57:15 kill(24357, SIG_0) = 0 > 07:57:15 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:57:15 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:57:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:57:15 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:57:15 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:57:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:57:15 alarm(15) = 0 > 07:57:15 read(9, 0xbfe6e30, 4096) = ? ERESTARTSYS (To be > restarted) > 07:57:30 --- SIGALRM (Alarm clock) @ 0 (0) --- > 07:57:30 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 07:57:30 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 07:57:30 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 07:57:30 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:57:30 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:57:30 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:57:30 kill(24357, SIG_0) = 0 > 07:57:30 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:57:30 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:57:30 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:57:30 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:57:30 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:57:30 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:57:30 alarm(15) = 0 > 07:57:30 read(9,<unfinished ...> > Process 24361 detached > --------- > About to 'strace' PID 24362 > ----------- > Process 24362 attached - interrupt to quit > 07:57:39 read(10, 0xbfe7340, 4096) = ? ERESTARTSYS (To be > restarted) > 07:57:45 --- SIGALRM (Alarm clock) @ 0 (0) --- > 07:57:45 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 07:57:45 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 07:57:45 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 07:57:45 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:57:45 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:57:45 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:57:45 kill(24357, SIG_0) = 0 > 07:57:45 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:57:45 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:57:45 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:57:45 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:57:45 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:57:45 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:57:45 alarm(15) = 0 > 07:57:45 read(10, 0xbfe7340, 4096) = ? ERESTARTSYS (To be > restarted) > 07:58:00 --- SIGALRM (Alarm clock) @ 0 (0) --- > 07:58:00 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 07:58:00 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 07:58:00 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 07:58:00 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:58:00 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:58:00 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:58:00 kill(24357, SIG_0) = 0 > 07:58:00 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:58:00 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:58:00 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:58:00 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:58:00 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:58:00 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:58:00 alarm(15) = 0 > 07:58:00 read(10,<unfinished ...> > Process 24362 detached > --------- > About to 'strace' PID 24363 > ----------- > Process 24363 attached - interrupt to quit > 07:58:03 read(11, 0xbfe7850, 4096) = ? ERESTARTSYS (To be > restarted) > 07:58:15 --- SIGALRM (Alarm clock) @ 0 (0) --- > 07:58:15 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 07:58:15 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 07:58:15 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 07:58:15 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:58:15 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:58:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:58:15 kill(24357, SIG_0) = 0 > 07:58:15 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:58:15 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:58:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:58:15 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:58:15 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:58:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:58:15 alarm(15) = 0 > 07:58:15 read(11, 0xbfe7850, 4096) = ? ERESTARTSYS (To be > restarted) > 07:58:30 --- SIGALRM (Alarm clock) @ 0 (0) --- > 07:58:30 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 07:58:30 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 07:58:30 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 07:58:30 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:58:30 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:58:30 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:58:30 kill(24357, SIG_0) = 0 > 07:58:30 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:58:30 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:58:30 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:58:30 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:58:30 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:58:30 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:58:30 alarm(15) = 0 > 07:58:30 read(11,<unfinished ...> > Process 24363 detached > --------- > About to 'strace' PID 24365 > ----------- > Process 24365 attached - interrupt to quit > 07:58:34 read(13, 0xbfe8270, 4096) = ? ERESTARTSYS (To be > restarted) > 07:58:45 --- SIGALRM (Alarm clock) @ 0 (0) --- > 07:58:45 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 07:58:45 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 07:58:45 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 07:58:45 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:58:45 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:58:45 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:58:45 kill(24357, SIG_0) = 0 > 07:58:45 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:58:45 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:58:45 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:58:45 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:58:45 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:58:45 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:58:45 alarm(15) = 0 > 07:58:45 read(13, 0xbfe8270, 4096) = ? ERESTARTSYS (To be > restarted) > 07:59:00 --- SIGALRM (Alarm clock) @ 0 (0) --- > 07:59:00 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 07:59:00 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 07:59:00 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 07:59:00 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:59:00 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:59:00 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:59:00 kill(24357, SIG_0) = 0 > 07:59:00 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:59:00 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:59:00 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:59:00 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:59:00 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:59:00 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:59:00 alarm(15) = 0 > 07:59:00 read(13,<unfinished ...> > Process 24365 detached > --------- > About to 'strace' PID 24366 > ----------- > Process 24366 attached - interrupt to quit > 07:59:04 read(14, 0xbfe87c0, 4096) = ? ERESTARTSYS (To be > restarted) > 07:59:15 --- SIGALRM (Alarm clock) @ 0 (0) --- > 07:59:15 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 07:59:15 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 07:59:15 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 07:59:15 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:59:15 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:59:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:59:15 kill(24357, SIG_0) = 0 > 07:59:15 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:59:15 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:59:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:59:15 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:59:15 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:59:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:59:15 alarm(15) = 0 > 07:59:15 read(14, 0xbfe87c0, 4096) = ? ERESTARTSYS (To be > restarted) > 07:59:30 --- SIGALRM (Alarm clock) @ 0 (0) --- > 07:59:30 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 07:59:30 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 07:59:30 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 07:59:30 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:59:30 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:59:30 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:59:30 kill(24357, SIG_0) = 0 > 07:59:30 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:59:30 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:59:30 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:59:30 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:59:30 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:59:30 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:59:30 alarm(15) = 0 > 07:59:30 read(14,<unfinished ...> > Process 24366 detached > --------- > About to 'strace' PID 24367 > ----------- > Process 24367 attached - interrupt to quit > 07:59:31 read(15, 0xbfe8cd0, 4096) = ? ERESTARTSYS (To be > restarted) > 07:59:46 --- SIGALRM (Alarm clock) @ 0 (0) --- > 07:59:46 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 07:59:46 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 07:59:46 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 07:59:46 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:59:46 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:59:46 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:59:46 kill(24357, SIG_0) = 0 > 07:59:46 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:59:46 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:59:46 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:59:46 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 07:59:46 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 07:59:46 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 07:59:46 alarm(15) = 0 > 07:59:46 read(15, 0xbfe8cd0, 4096) = ? ERESTARTSYS (To be > restarted) > 08:00:01 --- SIGALRM (Alarm clock) @ 0 (0) --- > 08:00:01 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 08:00:01 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 08:00:01 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 08:00:01 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 08:00:01 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 08:00:01 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 08:00:01 kill(24357, SIG_0) = 0 > 08:00:01 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 08:00:01 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 08:00:01 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 08:00:01 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 08:00:01 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 08:00:01 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 08:00:01 alarm(15) = 0 > 08:00:01 read(15,<unfinished ...> > Process 24367 detached > --------- > About to 'strace' PID 24368 > ----------- > Process 24368 attached - interrupt to quit > 08:00:05 read(16, 0xbfe91e0, 4096) = ? ERESTARTSYS (To be > restarted) > 08:00:15 --- SIGALRM (Alarm clock) @ 0 (0) --- > 08:00:15 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 08:00:15 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 08:00:15 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 08:00:15 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 08:00:15 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 08:00:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 08:00:15 kill(24357, SIG_0) = 0 > 08:00:15 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 08:00:15 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 08:00:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 08:00:15 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 08:00:15 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 08:00:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 08:00:15 alarm(15) = 0 > 08:00:15 read(16, 0xbfe91e0, 4096) = ? ERESTARTSYS (To be > restarted) > 08:00:30 --- SIGALRM (Alarm clock) @ 0 (0) --- > 08:00:30 rt_sigreturn(0) = -1 EINTR (Interrupted system > call) > 08:00:30 rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0 > 08:00:30 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0 > 08:00:30 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 08:00:30 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {0x363207de40, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 08:00:30 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 08:00:30 kill(24357, SIG_0) = 0 > 08:00:30 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 08:00:30 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 08:00:30 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 08:00:30 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 > 08:00:30 rt_sigaction(SIGALRM, {0x363207de40, [], SA_RESTORER, > 0x363080e930}, {SIG_DFL, [], SA_RESTORER, 0x363080e930}, 8) = 0 > 08:00:30 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 08:00:30 alarm(15) = 0 > 08:00:30 read(16,<unfinished ...> > Process 24368 detached > --------- > > --- > # ps -ef | grep devmon > root 8973 16821 0 08:14 pts/1 00:00:00 grep devmon > devmon 24357 1 1 Feb22 ? 01:14:11 devmon[master] > devmon 24359 24357 0 Feb22 ? 00:04:02 devmon > devmon 24360 24357 0 Feb22 ? 00:04:01 devmon > devmon 24361 24357 0 Feb22 ? 00:02:57 devmon > devmon 24362 24357 0 Feb22 ? 00:04:01 devmon > devmon 24363 24357 0 Feb22 ? 00:02:53 devmon > devmon 24365 24357 0 Feb22 ? 00:02:55 devmon > devmon 24366 24357 0 Feb22 ? 00:04:13 devmon > devmon 24367 24357 0 Feb22 ? 00:02:26 devmon > devmon 24368 24357 0 Feb22 ? 00:02:52 devmon > > 08:14:10 HOST=sw02.hpdms USER=root > # ls -l /proc/24357/fd > total 0 > lrwx------ 1 devmon devmon 64 Feb 26 08:14 0 -> /dev/null > lrwx------ 1 devmon devmon 64 Feb 26 08:14 1 -> /dev/null > lrwx------ 1 devmon devmon 64 Feb 26 08:14 10 -> socket:[5679517] > lrwx------ 1 devmon devmon 64 Feb 26 08:14 11 -> socket:[5679519] > lrwx------ 1 devmon devmon 64 Feb 26 08:14 12 -> socket:[5679521] > lrwx------ 1 devmon devmon 64 Feb 26 08:14 13 -> socket:[5679523] > lrwx------ 1 devmon devmon 64 Feb 26 08:14 14 -> socket:[5679525] > lrwx------ 1 devmon devmon 64 Feb 26 08:14 15 -> socket:[5679527] > lrwx------ 1 devmon devmon 64 Feb 26 08:14 2 -> /dev/null > l-wx------ 1 devmon devmon 64 Feb 26 08:14 3 -> > /var/log/devmon/devmon.log.1 > lr-x------ 1 devmon devmon 64 Feb 26 08:14 4 -> > /usr/share/devmon/templates > lr-x------ 1 devmon devmon 64 Feb 26 08:14 5 -> > /usr/share/devmon/templates/ironport-asyncos > lrwx------ 1 devmon devmon 64 Feb 26 08:14 6 -> socket:[5679509] > lrwx------ 1 devmon devmon 64 Feb 26 08:14 7 -> socket:[5679511] > lrwx------ 1 devmon devmon 64 Feb 26 08:14 8 -> socket:[5679513] > lrwx------ 1 devmon devmon 64 Feb 26 08:14 9 -> socket:[5679515] > --- > > > From the logs, strace output and viewing the source, I believe that > the master process is stuck on the child with file descriptor 11. > Interestingly, fd 12 is shown in 'ls' above but not in the preceding > 'ps'. Maybe I'm missing something... > > I going to leave devmon in this state for a while to do further > testing but if anyone has any ideas they want me to try, I'll happily > oblige. > > Thanks > > CC > > On Sat, Feb 20, 2010 at 12:58 AM, Young, Tom<tom...@tw...> > wrote: > >> Hi, >> >> I have one of three devmon pollers that keeps going purple, every few >> > hours or so. Running wireshark shows it completely stops communicating > with the xymon server. Is there a fix to this other than restarting it > every time it goes purple, or restarting it ever X hours? > >> Thanks, >> >> Tom >> |