Thread: [Mon-devel] upalertafter broken
Brought to you by:
trockij
From: Wolfram S. <li...@wo...> - 2007-10-09 14:24:29
|
Hi, with the current mon-1.2.0, the "upalertafter" functionality is simply broken. I tried to find a quick fix by digging into the code, but I failed to find a sane modification to process_event() and/or do_alert(). When will this be fixed? Best regards, Wolfram Schlich |
From: David N. <vit...@cm...> - 2007-10-09 14:33:43
|
On 10/9/07, Wolfram Schlich <li...@wo...> wrote: > Hi, > > with the current mon-1.2.0, the "upalertafter" functionality is simply > broken. > > I tried to find a quick fix by digging into the code, but I failed to > find a sane modification to process_event() and/or do_alert(). > > When will this be fixed? > Can you provide us with a better problem description then "simply broken"? Basically what we need to know is: - how did you have the period configured (a copy of all period config statements, without any local details like email addresses, etc.) - how did you test it? presumably by generating a failure, receiving an alert and then generating a success. details please on timing, alerts, etc. i.e. what did you expect to happen, and what actually happened? Once we have a detailed bug report to investigate we can try to track this down. -David |
From: Wolfram S. <li...@wo...> - 2007-10-09 14:55:21
|
* David Nolan <vit...@cm...> [2007-10-09 16:34]: > On 10/9/07, Wolfram Schlich <li...@wo...> wrote: > > Hi, > > > > with the current mon-1.2.0, the "upalertafter" functionality is simply > > broken. > > > > I tried to find a quick fix by digging into the code, but I failed to > > find a sane modification to process_event() and/or do_alert(). > > > > When will this be fixed? > > Can you provide us with a better problem description then "simply broken"? > > Basically what we need to know is: > - how did you have the period configured (a copy of all period config > statements, without any local details like email addresses, etc.) > - how did you test it? presumably by generating a failure, receiving > an alert and then generating a success. details please on timing, > alerts, etc. i.e. what did you expect to happen, and what actually > happened? > > Once we have a detailed bug report to investigate we can try to track this down. Sorry -- when I looked at the code it was so obvious that it's broken, so I thought you (developers) all know about it and just haven't fixed it due to whatever reason :-) Ok, let's proceed... 'upalertafter' is only supported for period definitions, not for service definitions itself. Despite that fact, process_event() (mon line 3365) looks for $sref->{"upalertafter"}, which obviously doesn't exist. A place where the code loops through the periods and where one could check it is within the do_alert() function. Unfortunately, when you place the upalertafter check in there, it will only be run once, because process_event() already resets the status from FAIL to OK (I actually tried to put the check there, and it also got executed, but only once, making 'upalertafter' check senseless), thus never executing more than 1 do_alert() for an upalert (which is correct for other reasons). A correct check for upalertafter that evaluates to true or false looks like this: Decision to suppress the upalert: ($tmnow - $sref->{'_last_failure'} < $pref->{'upalertafter'}) Decision to run the upalert: ($tmnow - $sref->{'_last_failure'} >= $pref->{'upalertafter'}) ...but clearly not like the one present in line 3366: ($tmnow - $sref->{"_first_failure"}) >= $sref->{"upalertafter"}) Maybe it would be best to add some period looping code to process_event() and check upalertafter there. I am really surprised that such an essential feature is "unknown broken" :-/ Best regards, Wolfram |
From: David N. <vit...@cm...> - 2007-10-09 15:53:35
|
On 10/9/07, Wolfram Schlich <li...@wo...> wrote: > I am really surprised that such an essential feature is > "unknown broken" :-/ > Ahh, now that you've actually described the problem my first thought was "I thought we fixed that!" But nope, its still there... I'll try to dig into it and write a fix sometime soon. -David |
From: Wolfram S. <li...@wo...> - 2007-11-15 14:57:43
|
* David Nolan <vit...@cm...> [2007-10-09 17:54]: > On 10/9/07, Wolfram Schlich <li...@wo...> wrote: > > > I am really surprised that such an essential feature is > > "unknown broken" :-/ > > Ahh, now that you've actually described the problem my first thought > was "I thought we fixed that!" > > But nope, its still there... > > I'll try to dig into it and write a fix sometime soon. So, any news on this issue? This is really preventing me from producitvely using mon for an active/passive cluster setup monitoring... :-( -- Regards, Wolfram Schlich <wsc...@ge...> Gentoo Linux * http://dev.gentoo.org/~wschlich/ |
From: Jim T. <tr...@ar...> - 2007-11-15 18:28:47
|
On Tue, 9 Oct 2007, Wolfram Schlich wrote: > 'upalertafter' is only supported for period definitions, not for > service definitions itself. Despite that fact, process_event() > (mon line 3365) looks for $sref->{"upalertafter"}, which obviously > doesn't exist. correct. this is part of the bug. > A place where the code loops through the periods and where one > could check it is within the do_alert() function. correct again. the upalertafter processing is being handled in the wrong place. a while back i had cleaned up the code to make the trap processing use the same squelch logic as the other processing by putting that in process_event. this fixed some trap bugs, and i had intended to do some more cleanup related to that. so it does appear that the way to fix this is to rip out the decisions to call do_alert from process_event and stick them into do_alert. > Unfortunately, when you place the upalertafter check in there, > it will only be run once, because process_event() already resets sure, just some minor details :) david, have you had a look at this yet, and have you formulated an opinion on this? i'll move on this, but just let us know if you have some ideas. regarding the syslog bug, it's wrapped up in an eval to handle exception processing from deeper levels in Sys::Syslog, and the other gunk in there (the map) is a workaround for a bug in an older version of Sys::Syslog (0.07). the better way to fix this is to have it bail out on startup if the old buggy version is found, and tell people to get a newer version. fwiw, the perl that ships with sles10, fc6, rhel5 all include the newer version. sles8 and sles9 have the buggy version. from the manual: Note "Sys::Syslog" version v0.07 and older passed the $message as the formatting string to "sprintf()" even when no formatting arguments were provided. If the code calling "syslog()" might execute with older versions of this module, make sure to call the function as "syslog($priority, "%s", $message)" instead of "syslog($priority, $message)". This protects against hostile formatting sequences that might show up if $message contains tainted data. |
From: Augie S. <aug...@gm...> - 2007-11-15 19:22:00
|
On 11/15/07, Jim Trocki <tr...@ar...> wrote: > regarding the syslog bug, it's wrapped up in an eval to handle exception > processing from deeper levels in Sys::Syslog, and the other gunk in there (the > map) is a workaround for a bug in an older version of Sys::Syslog (0.07). the > better way to fix this is to have it bail out on startup if the old buggy > version is found, and tell people to get a newer version. I recently ran into this, and as far as I could tell the 'map' wasn't needed and it was even causing the interpreter to fail because it's trying to alter @_ a read only variable which later Perl revs. fail on. My quick fix was the following: # diff -u mon.orig /usr/sbin/mon --- mon.orig 2007-11-07 15:16:35.000000000 -0800 +++ /usr/sbin/mon 2007-11-07 16:04:09.000000000 -0800 @@ -5385,8 +5385,9 @@ sub syslog { eval { local $SIG{"__DIE__"}= sub { }; - my @log = map { s/\%//mg; } @_; - Sys::Syslog::syslog(@log); +# my @log = map { s/\%//mg; } @_; +# Sys::Syslog::syslog(@log); + Sys::Syslog::syslog(@_); } } use warnings; -- Augie Schwer - Augie@Schwer.us - http://schwer.us Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072 |
From: Wolfram S. <li...@wo...> - 2007-11-30 10:01:03
|
* Jim Trocki <tr...@ar...> [2007-11-15 19:36]: > On Tue, 9 Oct 2007, Wolfram Schlich wrote: > >> 'upalertafter' is only supported for period definitions, not for >> service definitions itself. Despite that fact, process_event() >> (mon line 3365) looks for $sref->{"upalertafter"}, which obviously >> doesn't exist. > > correct. this is part of the bug. > >> A place where the code loops through the periods and where one >> could check it is within the do_alert() function. > > correct again. the upalertafter processing is being handled in the wrong > place. > a while back i had cleaned up the code to make the trap processing use the > same > squelch logic as the other processing by putting that in process_event. > this > fixed some trap bugs, and i had intended to do some more cleanup related to > that. so it does appear that the way to fix this is to rip out the > decisions to > call do_alert from process_event and stick them into do_alert. > >> Unfortunately, when you place the upalertafter check in there, >> it will only be run once, because process_event() already resets > > sure, just some minor details :) > > david, have you had a look at this yet, and have you formulated an opinion > on > this? i'll move on this, but just let us know if you have some ideas. Hi, I really don't want to bother you, but: any news on this upalertafter bug? :o) -- Regards, Wolfram Schlich <wsc...@ge...> Gentoo Linux * http://dev.gentoo.org/~wschlich/ |
From: Jim T. <tr...@ar...> - 2007-12-03 17:49:15
|
On Fri, 30 Nov 2007, Wolfram Schlich wrote: > > I really don't want to bother you, but: any news on this upalertafter bug? :o) > it'll be fixed this week. |
From: Wolfram S. <li...@wo...> - 2007-12-03 18:05:27
|
* Jim Trocki <tr...@ar...> [2007-12-03 18:56]: > On Fri, 30 Nov 2007, Wolfram Schlich wrote: > >> >> I really don't want to bother you, but: any news on this upalertafter bug? >> :o) >> > > it'll be fixed this week. Thanks, that's very nice to hear^Wread :) Is it possible to make some donation?! :) -- Regards, Wolfram Schlich <wsc...@ge...> Gentoo Linux * http://dev.gentoo.org/~wschlich/ |
From: Wolfram S. <li...@wo...> - 2008-01-07 07:32:21
|
* Jim Trocki <tr...@ar...> [2007-12-03 18:56]: > On Fri, 30 Nov 2007, Wolfram Schlich wrote: > >> >> I really don't want to bother you, but: any news on this upalertafter bug? >> :o) >> > > it'll be fixed this week. Happy new year! Any news on this one? :) Best regards, Wolfram |
From: Wolfram S. <li...@wo...> - 2008-01-29 21:50:20
|
* Jim Trocki <tr...@ar...> [2007-12-03 18:56]: > On Fri, 30 Nov 2007, Wolfram Schlich wrote: > >> >> I really don't want to bother you, but: any news on this upalertafter bug? >> :o) >> > > it'll be fixed this week. Ping :) -- Regards, Wolfram Schlich <wsc...@ge...> Gentoo Linux * http://dev.gentoo.org/~wschlich/ |
From: Wolfram S. <li...@wo...> - 2008-02-06 17:53:06
|
* Wolfram Schlich <li...@wo...> [2008-01-29 22:51]: > * Jim Trocki <tr...@ar...> [2007-12-03 18:56]: > > On Fri, 30 Nov 2007, Wolfram Schlich wrote: > > > >> > >> I really don't want to bother you, but: any news on this upalertafter bug? > >> :o) > >> > > > > it'll be fixed this week. > > Ping :) Ping :) -- Regards, Wolfram Schlich <wsc...@ge...> Gentoo Linux * http://dev.gentoo.org/~wschlich/ |
From: Ed R. <er...@pa...> - 2008-02-06 17:57:27
|
On Wed, Feb 06, 2008 at 06:52:46PM +0100, Wolfram Schlich wrote: > * Wolfram Schlich <li...@wo...> [2008-01-29 22:51]: > > * Jim Trocki <tr...@ar...> [2007-12-03 18:56]: > > >> > > >> I really don't want to bother you, but: any news on this upalertafter bug? > > >> :o) > > >> > > > > > > it'll be fixed this week. > > > > Ping :) > > Ping :) ICMP response from mon...@li...: developer unreachable |
From: Wolfram S. <li...@wo...> - 2008-02-06 18:00:41
|
* Ed Ravin <er...@pa...> [2008-02-06 18:57]: > On Wed, Feb 06, 2008 at 06:52:46PM +0100, Wolfram Schlich wrote: > > * Wolfram Schlich <li...@wo...> [2008-01-29 22:51]: > > > * Jim Trocki <tr...@ar...> [2007-12-03 18:56]: > > > >> > > > >> I really don't want to bother you, but: any news on this upalertafter bug? > > > >> :o) > > > >> > > > > > > > > it'll be fixed this week. > > > > > > Ping :) > > > > Ping :) > > ICMP response from mon...@li...: developer unreachable I was really just awaiting something like this :))) -- Regards, Wolfram Schlich <wsc...@ge...> Gentoo Linux * http://dev.gentoo.org/~wschlich/ |
From: Wolfram S. <li...@wo...> - 2008-03-12 14:03:47
|
* Jim Trocki <tr...@ar...> [2007-12-03 18:56]: > On Fri, 30 Nov 2007, Wolfram Schlich wrote: >> I really don't want to bother you, but: any news on this upalertafter bug? >> :o) > > it'll be fixed this week. Jim, sorry for asking once again, but... when will upalertafter be fixed? TIA. -- Regards, Wolfram Schlich <wsc...@ge...> Gentoo Linux * http://dev.gentoo.org/~wschlich/ |
From: Wolfram S. <li...@wo...> - 2008-06-07 14:25:51
|
* Wolfram Schlich <li...@wo...> [2008-03-12 15:04]: > * Jim Trocki <tr...@ar...> [2007-12-03 18:56]: > > On Fri, 30 Nov 2007, Wolfram Schlich wrote: > >> I really don't want to bother you, but: any news on this upalertafter bug? > >> :o) > > > > it'll be fixed this week. > > Jim, sorry for asking once again, but... when will upalertafter be fixed? So, will it be fixed *at all*?! -- Regards, Wolfram Schlich <wsc...@ge...> Gentoo Linux * http://dev.gentoo.org/~wschlich/ |
From: Wolfram S. <li...@wo...> - 2008-06-17 10:45:53
|
* Wolfram Schlich <li...@wo...> [2008-06-07 16:26]: > * Wolfram Schlich <li...@wo...> [2008-03-12 15:04]: > > * Jim Trocki <tr...@ar...> [2007-12-03 18:56]: > > > On Fri, 30 Nov 2007, Wolfram Schlich wrote: > > >> I really don't want to bother you, but: any news on this upalertafter bug? > > >> :o) > > > > > > it'll be fixed this week. > > > > Jim, sorry for asking once again, but... when will upalertafter be fixed? > > So, will it be fixed *at all*?! Ok, to sum it up: - mon has not seen a release for 1 year - I reported the severe upalertafter bug 8 months ago - no developer response to my inquiries for 6 months Now that's just enough to finally get rid of mon. FYI, I am now unsubscribing from the mon lists. So long and thanks for all the fish. -- Regards, Wolfram Schlich <wsc...@ge...> Gentoo Linux * http://dev.gentoo.org/~wschlich/ |