From: Buchan M. <bg...@st...> - 2009-01-09 11:17:43
|
On Friday 09 January 2009 10:00:56 Thomas Kähn wrote: > Hi Buchan, > > On Thu, Jan 08, 2009 at 07:05:19PM +0200, Buchan Milne wrote: > > Running this for a few days turned this up, one second (but about 100 > > lines) before devmon stopped doing anything: > > > > [09-01-07@22:45:12] DEBUG SNMP: Pooling 1 oids > > Can't use an undefined value as a HASH reference > > at /usr/share/devmon/modules/dm_snmp.pm line 399, <$__ANONIO__> line > > 59375. > > [09-01-07@22:45:12] DEBUG SNMP: Failed queries 0 > > > > I have changed some of the logging inside the affected subroutine (to > > print the fork number ($form_num) for every "DEBUG SNMP" line) to try > > and confirm what I think is happening. I will also try and see if I can > > test a fix (without devmon itself). > > > > But, again, confirmation from others would help ... > > one of our devmon servers turned purple this morning. It stopped here: > > [09-01-09@07:24:36] DEBUG TEMPLATES: running post_template_load() > [09-01-09@07:24:36] DEBUG CFG: running read_hosts > [09-01-09@07:24:36] DEBUG SNMP: running poll_devices() > [09-01-09@07:24:36] Starting snmp queries > [09-01-09@07:24:36] Getting device status from hobbit at xx.xx.xx.xx:1984 > > It is possible, that this system couldn't reach the hobbit server > at that time. However it didn't recover from this status. Hmm, try as I might I can't get hobbitd to die between "my $sock = IO::Socket::INET->new" (or "if(defined $sock)" ) and while(<$sock>). However, simply adding a timeout to the socket may be enough to fix this, can you try with this patch ? =================================================================== --- modules/dm_snmp.pm (revision 110) +++ modules/dm_snmp.pm (working copy) @@ -66,7 +66,8 @@ my $sock = IO::Socket::INET->new ( PeerAddr => $g{'dispserv'}, PeerPort => $g{'dispport'}, - Proto => 'tcp' + Proto => 'tcp', + Timeout => 10, ); if(defined $sock) { > The debug log shows some messages like 'Use of uninitialized value in > addition (+) at (eval 94189069) line 1, <$__ANONIO__> line 1450529.', too. > But they are shown in every cycle and not just before devmon stopped. Yes, I think these may depend on the templates, and are non-fatal, but I may consider trying to clean them up (to reduce noise in the log or avoid eating up space in the scrollback buffer in screen) for situations like this. Regards, Buchan |