|
From: Buchan M. <bg...@st...> - 2011-02-02 12:24:41
|
On Monday, 29 November 2010 11:55:19 Buchan Milne wrote:
> On Friday, 26 November 2010 15:14:50 Taylor Lewick wrote:
> > We upgraded xymon to run 4.3.0-0.beta2 from hobbit 4.2.3
> > In so doing, we also upgraded devmon to devmon 0.3.1-beta1. So far
> > everything with xymon works great, and devmon works as expected, except
> > it goes purple quite often.
>
> I can't reproduce this reliably myself.
Well, I'm back in an environment I was in before, and one of the devmon
installations (on RHEL5) still gives the purple problem, even after upgrading
to svn (but, for another installation, on RHEL4, the upgrade seems to have
solved all problems).
In the case of the problematic installation, it reports to a remote hobbitd,
and with -vvv --debug, it logs this:
[11-02-02@09:56:56] DEBUG: Opening socket to xxx.xxx.xxx.xxx:1984
[11-02-02@09:56:56] DEBUG: Looping through messages to build a combo
[11-02-02@09:56:56] DEBUG: Printing single combo message size 26428
[11-02-02@09:56:56] DEBUG: Finished printing single combo message
[11-02-02@09:56:56] DEBUG: Opening socket to xxx.xxx.xxx.xxx:1984
[11-02-02@09:56:56] DEBUG: Looping through messages to build a combo
[11-02-02@09:56:56] DEBUG: Printing single combo message size 26412
[11-02-02@09:56:56] DEBUG: Finished printing single combo message
[11-02-02@09:56:56] DEBUG: Opening socket to xxx.xxx.xxx.xxx:1984
[11-02-02@09:56:56] DEBUG: Looping through messages to build a combo
[11-02-02@09:56:56] DEBUG: Printing single combo message size 41414
[11-02-02@09:56:56] DEBUG: Finished printing single combo message
[11-02-02@09:56:56] DEBUG: Opening socket to xxx.xxx.xxx.xxx:1984
[11-02-02@09:56:56] DEBUG: Looping through messages to build a combo
[11-02-02@09:56:56] DEBUG: Printing combo message with multiple messages
[11-02-02@09:56:56] DEBUG: Finished printing combo message with multiple
messages
[11-02-02@09:56:56] DEBUG: Opening socket to xxx.xxx.xxx.xxx:1984
[11-02-02@09:56:56] DEBUG: Looping through messages to build a combo
[11-02-02@09:56:56] DEBUG: Printing single combo message size 8338
[11-02-02@09:56:56] DEBUG: Finished printing single combo message
[11-02-02@09:56:56] DEBUG: Opening socket to xxx.xxx.xxx.xxx:1984
[11-02-02@09:56:56] DEBUG: Looping through messages to build a combo
[11-02-02@09:56:56] DEBUG: Printing combo message with multiple messages
[...]
[11-02-02@09:56:57] DEBUG: Opening socket to xxx.xxx.xxx.xxx:1984
[11-02-02@09:56:57] DEBUG: Looping through messages to build a combo
[11-02-02@09:56:57] DEBUG: Printing combo message with multiple messages
[11-02-02@09:56:57] DEBUG: Finished printing combo message with multiple
messages
[11-02-02@09:56:57] DEBUG: Opening socket to xxx.xxx.xxx.xxx:1984
[11-02-02@09:56:57] DEBUG: Looping through messages to build a combo
[11-02-02@09:56:57] DEBUG: Printing combo message with multiple messages
[11-02-02@09:56:57] DEBUG: Finished printing combo message with multiple
messages
[11-02-02@09:56:57] DEBUG: Printing combo message with multiple messages
[11-02-02@09:56:57] DEBUG: Finished printing combo message with multiple
messages
[11-02-02@09:56:57] DEBUG: Printing combo message with multiple messages
[11-02-02@09:56:57] DEBUG: Finished printing combo message with multiple
messages
[11-02-02@09:56:57] DEBUG: Opening socket to xxx.xxx.xxx.xxx:1984
[11-02-02@09:56:57] DEBUG: Looping through messages to build a combo
[11-02-02@09:56:57] DEBUG: Printing combo message with multiple messages
[11-02-02@09:56:57] DEBUG: Finished printing combo message with multiple
messages
[11-02-02@09:56:57] DEBUG: Opening socket to xxx.xxx.xxx.xxx:1984
[11-02-02@09:57:00] DEBUG: Looping through messages to build a combo
[11-02-02@09:57:00] DEBUG: Printing single combo message size 102907
[11-02-02@10:49:11] Forking to background process 11517
[11-02-02@10:49:11] Re-opened log file /var/log/devmon/devmon.log
[11-02-02@10:49:11] Nodename autodetected as yyy.yyy.yyy
[11-02-02@10:49:11] Option 'bbdateformat' defaulting to: %a %b %d %H:%M:%S %Z
%Y.
[11-02-02@10:49:11] Option 'snmptimeout' defaulting to: 2.
[11-02-02@10:49:11] ---Initilizing devmon...
[11-02-02@10:49:11] Verbosity level: 3
[11-02-02@10:49:11] Logging to /var/log/devmon/devmon.log
[...]
I have seen it fail a number of times, and every time it fails, it fails on:
[11-02-02@09:57:00] DEBUG: Printing single combo message size 102907
But, we see above that one the same polling cycle, it managed to do this a
number of times without a problem. I don't think it's specifically the size,
as I have seen it fail with "message size" of less (around 30 000).
When it fails like this, the devmon[master] process is missing, and the rest
of the devmon forks are eating CPU (which I will look at fixing).
I have moved some logging around locally, the code running currently looks
like this:
# Make sure the message itself isnt too big
if(length $msg > $g{'msgsize'}) {
# Nuts, this is a huge message, bigger than our msg size. Well want
# to send it by itself to minimize how much it gets truncated
if($msg_size == 0) {
my $thismsgsize = length $msg;
# Okay, we are clear, send the message
eval {
local $SIG{ALRM} = sub {do_log("Dying in alarm",0); die
"Printing message timed out\n" };
alarm 10;
do_log("DEBUG: Printing single combo message size
$thismsgsize",3) if $g{'debug'};
print SOCK "$msg\n";
do_log("DEBUG: Finished printing single combo message",3) if
$g{'debug'};
alarm 0;
};
if ($@) {
do_log("Timed out printing to display server: $!",0);
close SOCK;
return;
}
}
# Not an empty combo msg, wait till our new socket is open
else {
unshift @{$g{'test_results'}}, $msg;
}
# Either way, open a new socket
close SOCK;
next SOCKLOOP;
}
(very similar to what is in svn)
Now, I don't see how the current behaviour is possible. After printing "DEBIG:
Printing single combo message ...", it should print to the socket. If the
print call takes more than 10s, we should at *least* see "Dying in alarm", or
"Timed out printing to display server: Printing message timed out" in the log.
If it succeeds, we should of course see "Finished printing single combo
message" as we do in the other examples.
So, I wonder if anyone else can shed some light on this?
I was really hoping to try and solve all the purple issues I could reproduce
soon, and release 0.3.1 ...
Regards,
Buchan
|