mon-devel Mailing List for mon (Page 6)
Brought to you by:
trockij
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
(13) |
Aug
(6) |
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
|
Feb
(27) |
Mar
|
Apr
(9) |
May
(11) |
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
(15) |
2006 |
Jan
|
Feb
(6) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2007 |
Jan
|
Feb
|
Mar
(14) |
Apr
(4) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(6) |
Nov
(4) |
Dec
(8) |
2008 |
Jan
(6) |
Feb
(4) |
Mar
(7) |
Apr
|
May
|
Jun
(2) |
Jul
(1) |
Aug
|
Sep
|
Oct
(2) |
Nov
(1) |
Dec
|
2009 |
Jan
(1) |
Feb
(1) |
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(3) |
2010 |
Jan
(11) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(7) |
Nov
(7) |
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
(1) |
Dec
|
2013 |
Jan
|
Feb
(3) |
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
(1) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Konstantin 'K. S. <ka...@ep...> - 2005-07-22 02:50:57
|
I had to make the following changes to make snpp.alert work: --- cvsroot/mon/alert.d/snpp.alert 2004-06-08 22:18:07.000000000 -0700 +++ cvsroot-mine/mon/alert.d/snpp.alert 2005-07-21 19:15:31.000000000 -0700 @@ -25,7 +25,7 @@ # $Id: snpp.alert,v 1.1.1.1 2004/06/09 05:18:07 trockij Exp $ # use strict; -use vars qw /$opt_g $opt_q $opt_s $opt_t/; +use vars qw /$opt_g $opt_q $opt_s $opt_t $opt_h $opt_l $opt_u/; use Getopt::Std; use Net::SNPP; @@ -52,7 +52,7 @@ my $snpp = Net::SNPP->new ($opt_q) or die; -$ALERT = $opt_u ? "UPALERT" : "ALERT"; +my $ALERT = $opt_u ? "UPALERT" : "ALERT"; $snpp->send ( Pager => [ @ARGV ], Message => "$ALERT $opt_g/$opt_s: $summary ($wday $mon $day $tm)" ); -- Konstantin 'Kastus' Shchuka Unix System Administrator Epocrates Inc. tel 650.227.1786 fax 650.592.6995 |
From: Konstantin 'K. S. <ka...@ep...> - 2005-07-22 01:45:12
|
While comparing CVS version of qpage.alert to the one I'm using I noticed the following difference: --- cvsroot-mine/mon/alert.d/qpage.alert 2005-06-01 17:30:57.000000000 -0700 +++ /usr/lib/mon/alert.d/qpage.alert 2001-06-26 08:45:27.000000000 -0700 @@ -11,7 +11,7 @@ # -l service level # -q SNPP server, translates to "qpage -s" # -# Jim Trocki, tr...@ar... +# Jim Trocki, tr...@tr... # # Copyright (C) 1998, Jim Trocki # @@ -67,7 +67,7 @@ else { - if (system ("qpage -p $pagedest " . + if (!system ("qpage -p $pagedest " . "'$ALERT $opt_g/$opt_s: $summary ($wday $mon $day $tm)'" . "2>/dev/null")) { The version I'm using has "!system" which looks corect to me. Any idea why negation was lost in CVS? Thanks, -- Konstantin 'Kastus' Shchuka Unix System Administrator Epocrates Inc. tel 650.227.1786 fax 650.592.6995 |
From: Ed R. <er...@pa...> - 2005-07-06 23:35:22
|
This is for process.monitor so you can put the SNMP community into mon.m4 (via the environment) rather than on the command line where people (or Mon clients using mon.cgi) might see it. -- Ed =================================================================== RCS file: RCS/process.monitor,v retrieving revision 1.1 diff -u -r1.1 process.monitor --- process.monitor 2005/07/06 23:32:40 1.1 +++ process.monitor 2005/07/06 23:33:06 @@ -42,7 +42,7 @@ $ENV{'MIBS'} = "UCD-SNMP-MIB"; getopts("c:"); -$community = $opt_c || 'public'; +$community = $opt_c || $ENV{'COMMUNITY'} || 'public'; $RETVAL = 0; |
From: Ed R. <er...@pa...> - 2005-05-09 22:42:11
|
On Sat, May 07, 2005 at 08:44:59AM -0400, David Nolan wrote: > Maybe he means suppress the upalert that occurs after an ackalert or > disablealert. I think he does mean that! > I think it might be useful to do that, but it needs to be > controllable. Certainly suppressing an upalert thats generated on the next > status update after a host is disabled would probably make sense. (I've > gotten that request, but was deferring it until the often thought about > per-host status tracking code actually gets written.) How about new keywords: ack_suppresses_upalert enable_suppresses_upalert And when enabled, they zero out _alert_sent after sending the alert? |
From: David N. <vit...@cm...> - 2005-05-07 12:45:05
|
--On Friday, May 06, 2005 5:33 PM -0700 Jim Trocki <tr...@ar...> wrote: >> Also, it would be nice if we could suppress subsequent alerts once >> something is acked - if it's acked, someone has (hopefully) taken >> responsibility for it and there's no need to send more alerts. >> > > hm, i thought that was working. the code to do that is in do_alert: Maybe he means suppress the upalert that occurs after an ackalert or disablealert. I think it might be useful to do that, but it needs to be controllable. Certainly suppressing an upalert thats generated on the next status update after a host is disabled would probably make sense. (I've gotten that request, but was deferring it until the often thought about per-host status tracking code actually gets written.) -David David Nolan <*> vit...@cm... curses: May you be forced to grep the termcap of an unclean yacc while a herd of rogue emacs fsck your troff and vgrind your pathalias! |
From: Jim T. <tr...@ar...> - 2005-05-07 00:33:12
|
On Fri, 6 May 2005, Ed Ravin wrote: > Also, it would be nice if we could suppress subsequent alerts once > something is acked - if it's acked, someone has (hopefully) taken > responsibility for it and there's no need to send more alerts. > hm, i thought that was working. the code to do that is in do_alert: # # no alerts for ack'd failures, except for upalerts or summary # changes # when observe_summary is set # if ($sref->{"_ack"} != 0 && !($flags & ($FL_UPALERT|$FL_ACKALERT|$FL_DISABLEALERT))) { syslog ("debug", "no alert for $group.$service" . " because of ack'd failure"); return; } |
From: Ed R. <er...@pa...> - 2005-05-07 00:17:46
|
On Fri, May 06, 2005 at 07:49:41PM -0400, David Nolan wrote: > > --On Friday, May 06, 2005 7:27 PM -0400 Ed Ravin <er...@pa...> wrote: > >I love the ackalerts, but they're going out even if someone acknowledges > >a problem before the down alert has been issued. This patch fixes that: > > Oh cool, they work? I wrote that code a long time ago but never got around > to doing anything with it in our environment. I'd never actually done much > testing of it. :) I was wondering why the new alerts weren't documented. :-) Yes, ackalerts work fine, except for that one problem - when we ack an alarm to avoid sending mail and other notices, the ackalert goes out, and since alerts_sent is now non-zero, an upalert will go out too. Also, it would be nice if we could suppress subsequent alerts once something is acked - if it's acked, someone has (hopefully) taken responsibility for it and there's no need to send more alerts. -- Ed |
From: David N. <vit...@cm...> - 2005-05-06 23:49:48
|
--On Friday, May 06, 2005 7:27 PM -0400 Ed Ravin <er...@pa...> wrote: > I love the ackalerts, but they're going out even if someone acknowledges > a problem before the down alert has been issued. This patch fixes that: Oh cool, they work? I wrote that code a long time ago but never got around to doing anything with it in our environment. I'd never actually done much testing of it. :) -David David Nolan <*> vit...@cm... curses: May you be forced to grep the termcap of an unclean yacc while a herd of rogue emacs fsck your troff and vgrind your pathalias! |
From: Ed R. <er...@pa...> - 2005-05-06 23:27:08
|
I love the ackalerts, but they're going out even if someone acknowledges a problem before the down alert has been issued. This patch fixes that: @@ -629,9 +629,9 @@ # skip upalerts not paired with down alerts # disable by setting "no_comp_alerts" in period section # - if (!$pref->{"no_comp_alerts"} && ($flags & $FL_UPALERT) && !$pref->{"_alert_sent"}) + if (!$pref->{"no_comp_alerts"} && ($flags & ($FL_UPALERT | $FL_ACKALERT)) && !$pref->{"_alert_sent"}) { - syslog ('debug', "$group/$service/$periodlabel: Suppressing upalert since no down alert was sent."); + syslog ('debug', "$group/$service/$periodlabel: Suppressing upalert or ackalert since no down alert was sent."); next; } But I think it's insufficient - we should probably include disablealerts, since that's also an operator action that no one needs to hear about if the operator is doing it to keep something from alarming. -- Ed |
From: David N. <vit...@cm...> - 2005-05-05 16:51:55
|
--On Thursday, May 05, 2005 12:04 PM -0400 Ed Ravin <er...@pa...> wrote: > On Thu, May 05, 2005 at 11:29:09AM -0400, David Nolan wrote: >> I did recently finish the support for views in Mon, for filtering what >> hostgroups a client sees. The views are defined in the config file and >> client commands exist for listing what views exist and setting the view. >> If the client doesn't do anything they still see everything. I need to >> push the patches for that from the cmu cvs repository to sourceforge. >> I've also added support for this to our local copy of mon.cgi. Take a >> look at <http://monitor.andrew.cmu.edu/tbin/mon.cgi> to see it in >> action. > > Nice! I could use that feature. A few questions: > > * the alertlog and downtime log both show all events, regardless of the > view. Yeah, I'm intending to add the filtering there as well, but for now it only applies to the list opstatus and related commands. > > * The "monitor" field in the service details is blank - is this a security > feature to hide community names or the like? Actually its a side effect of the multiple server setup we have. monitor.andrew.cmu.edu is the front end which receives traps from the backends and does all the alert logic. Thus there are no monitor commands run on that server for 95% of the the services. > > * testing hosts is banned but not greyed out for the anonymous user in > the show opstatus view. It is greyed out in the individual service view. Probably a bug in mon.cgi, it's all the same 'test' entry in auth.cf. > > * Under "Force Next Check" in the opstatus view, some services are listed > in parens, others are listed as +seconds. What's the difference? The ones where a time is listed are the locally execute ones. The ones which are just a name are run on the slave servers and the master doesn't know timing info. > > * reload auth file doesn't seem to be banned for the anonymous user :-). Thats my backdoor if my auth.cf loses all the user entries for some reason, I can reload the config anyway. :) -David David Nolan <*> vit...@cm... curses: May you be forced to grep the termcap of an unclean yacc while a herd of rogue emacs fsck your troff and vgrind your pathalias! |
From: Ed R. <er...@pa...> - 2005-05-05 16:05:01
|
On Thu, May 05, 2005 at 11:29:09AM -0400, David Nolan wrote: > I did recently finish the support for views in Mon, for filtering what > hostgroups a client sees. The views are defined in the config file and > client commands exist for listing what views exist and setting the view. > If the client doesn't do anything they still see everything. I need to > push the patches for that from the cmu cvs repository to sourceforge. I've > also added support for this to our local copy of mon.cgi. Take a look at > <http://monitor.andrew.cmu.edu/tbin/mon.cgi> to see it in action. Nice! I could use that feature. A few questions: * the alertlog and downtime log both show all events, regardless of the view. * The "monitor" field in the service details is blank - is this a security feature to hide community names or the like? * testing hosts is banned but not greyed out for the anonymous user in the show opstatus view. It is greyed out in the individual service view. * Under "Force Next Check" in the opstatus view, some services are listed in parens, others are listed as +seconds. What's the difference? * reload auth file doesn't seem to be banned for the anonymous user :-). -- Ed |
From: David N. <vit...@cm...> - 2005-05-05 15:29:17
|
--On Thursday, May 05, 2005 8:06 AM -0700 Jim Trocki <tr...@ar...> wrote: > >> While fixing the problem above (whose fix I hope is going into CVS one of >> these days since I haven't seen any reply to that message) > > david committed that fix on the 28th, so it's in there: > > revision 1.14 > date: 2005/04/28 19:07:58; author: vitroth; state: Exp; lines: +3 -3 > Added missing argument to dep_ok to make alert suppression dependencies > work again. Nope, that was the fix for the other bug Ed reported on 4/28. I didn't deal with the second issue because I needed to go back and look at the logic decisions to make sure I fixed it the right way, and didn't have the time that day and never got back to it... (And now we're short staffed at work, so I'm more overworked then normal... Anybody know anyone looking for a job? <http://jobs.perl.org/job/2041>) I did recently finish the support for views in Mon, for filtering what hostgroups a client sees. The views are defined in the config file and client commands exist for listing what views exist and setting the view. If the client doesn't do anything they still see everything. I need to push the patches for that from the cmu cvs repository to sourceforge. I've also added support for this to our local copy of mon.cgi. Take a look at <http://monitor.andrew.cmu.edu/tbin/mon.cgi> to see it in action. -David David Nolan <*> vit...@cm... curses: May you be forced to grep the termcap of an unclean yacc while a herd of rogue emacs fsck your troff and vgrind your pathalias! |
From: Jim T. <tr...@ar...> - 2005-05-05 15:06:23
|
On Thu, 5 May 2005, Ed Ravin wrote: > On Thu, Apr 28, 2005 at 09:04:11PM -0400, Ed Ravin wrote: >> There seems to be some unfinished code in Mon 1.1-pre regarding the >> summary output that is given to an upalert. In Mon 0.9, the output >> of the last monitor run is given to the upalert. In Mon 1.1, the non-existent >> and otherwise unreferenced $sref->{"_upalertoutput"} is given the to the >> upalert, which is empty and thus useless. > > While fixing the problem above (whose fix I hope is going into CVS one of > these days since I haven't seen any reply to that message) david committed that fix on the 28th, so it's in there: revision 1.14 date: 2005/04/28 19:07:58; author: vitroth; state: Exp; lines: +3 -3 Added missing argument to dep_ok to make alert suppression dependencies work again. unfortunately sourceforge doesn't give you direct access to the developer's cvs repository. the public cvs server for anonymous checkouts is a read-only mirror of the developer's repository, and i've noticed that sometimes that can get out of sync. > Attached is a patch for mail.alert - actually, it's a patch for a local > version of mail.alert we use at my shop, so it might not apply > completely, but you'll get the idea. Here are the changes I've added: ok, i'll have a look at it and put it in today. thanks. |
From: Ed R. <er...@pa...> - 2005-05-05 14:11:53
|
On Thu, Apr 28, 2005 at 09:04:11PM -0400, Ed Ravin wrote: > There seems to be some unfinished code in Mon 1.1-pre regarding the > summary output that is given to an upalert. In Mon 0.9, the output > of the last monitor run is given to the upalert. In Mon 1.1, the non-existent > and otherwise unreferenced $sref->{"_upalertoutput"} is given the to the > upalert, which is empty and thus useless. While fixing the problem above (whose fix I hope is going into CVS one of these days since I haven't seen any reply to that message), I stumbled upon one of Mon 1.1's cool new features: the "ackalert". Using the usual alert policy, it will send an alert when someone ACKs a failure. I just turned this on at my shop, and my colleagues are very pleased with it. Since alerts now have several new possibilities (ackalerts, trapalerts, traptimeoutalerts, disablealerts), all of the alert scripts and the template need minor updates. To start with, they need to be taught about the new options. And since the full details of the alarm aren't present on stdin for anything but the downalert, if you want to display that info for other alerts, you need to use MON_LAST_OUTPUT or MON_LAST_SUMMARY. Attached is a patch for mail.alert - actually, it's a patch for a local version of mail.alert we use at my shop, so it might not apply completely, but you'll get the idea. Here are the changes I've added: * show MON_DESCRIPTION in the alert. In my shop, we're filling this field with instructions on how to fix the problem (remember "url2fix"? :-). * recognize all the new alert types, and supply appropriate messages for them. The most important change is for ackalerts - where I include the text of the ACK. I didn't do much for trapalert, I suppose one could display a field from the trap if appropriate? * Print MON_LAST_OUTPUT for the "detailed notes" section instead of the contents of STDIN. -- Ed PS: has anyone written documentation for the new alert types yet? They're not in the man page in 1.1. |
From: Ed R. <er...@pa...> - 2005-04-29 01:04:26
|
There seems to be some unfinished code in Mon 1.1-pre regarding the summary output that is given to an upalert. In Mon 0.9, the output of the last monitor run is given to the upalert. In Mon 1.1, the non-existent and otherwise unreferenced $sref->{"_upalertoutput"} is given the to the upalert, which is empty and thus useless. I'm going to guess that someone wanted to cache the output of the last failing monitor to give to the upalert, since that data would otherwise be lost when the monitor next succeeds. So here's some code for that (marked with !!!): # if this service has just come back up and # we are paying attention to this event, # let someone know # if (($sref->{"redistribute"} ne '') || ((defined ($sref->{"_op_status"})) && ($old_status == $STAT_FAIL) && (defined($sref->{"_upalert"})) && (!defined($sref->{"upalertafter"}) || (($tmnow - $sref->{"_first_failure"}) >= $sref->{"upalertafter"})))) { !!! $sref->{"_upalertoutput"}= $sref->{"_last_output"}; do_alert ($group, $service, $sref->{"_upalertoutput"}, 0, $FL_UPALERT); } I guess the next step would be to add upalertoutpout to the display fields in mon.cgi? I've wanted that for a while, when something comes back up and I would like to have seen the second or third line of what the monitor printed out when it was down... |
From: Ed R. <er...@pa...> - 2005-04-28 19:17:51
|
On Thu, Apr 28, 2005 at 03:01:19PM -0400, David Nolan wrote: > You need to read the documentation for Time::Period. > > hr{6am-11am} means 'Any time whose hour is between 6am and 11am, inclusive. > i.e. it corresponds to 6:00 AM to 11:59 AM. I am enlightened. I wonder why I didn't notice this before. I guess very few things break here between 11 and noon, and single-threaded alerts prevented the alert scripts' output (they write to users currently logged in) from getting scrambled until I went to Mon 1.1. At least I know non-blocking alerts are working! Thanks, -- Ed |
From: David N. <vit...@cm...> - 2005-04-28 19:07:27
|
--On Sunday, April 24, 2005 3:23 AM -0400 Ed Ravin <er...@pa...> wrote: > And I had to retreat shortly afterwards - alert dependencies weren't > working. Alerts weren't being called because depstatus was always > undefined. I tracked the problem down to this code in process_event: > > 3203 if ($sref->{"depend"} ne "" && > 3204 $sref->{"dep_behavior"} eq "a") > 3205 { > 3206 dep_ok ($sref); > 3207 } > > Ahem. Cough cough. Shouldn't line 3206 be: > > > 3206 dep_ok ($sref, 'a'); > > > It works a hell of a lot better now. I'm kicking myself for not > catching this in my preproduction testing - none of the watch entries > I tested with used dependencies. Who's using Mon-1.1-pre1? Why didn't > they notice this? > > Jim, David, is this the right fix? > It looks correct to me. Looks like a bug that appeared in mon-1.0-pre* when Jim did the process_event conversion. I'll commit that fix now. -David David Nolan <*> vit...@cm... curses: May you be forced to grep the termcap of an unclean yacc while a herd of rogue emacs fsck your troff and vgrind your pathalias! |
From: David N. <vit...@cm...> - 2005-04-28 19:01:23
|
--On Thursday, April 28, 2005 12:26 PM -0400 Ed Ravin <er...@pa...> wrote: > > I might expect the hey.alert to get called twice if this alarm was sent > at exactly 11 AM. But note the time in the monlog messages above, it > was well after 11 AM. You need to read the documentation for Time::Period. hr{6am-11am} means 'Any time whose hour is between 6am and 11am, inclusive. i.e. it corresponds to 6:00 AM to 11:59 AM. -David David Nolan <*> vit...@cm... curses: May you be forced to grep the termcap of an unclean yacc while a herd of rogue emacs fsck your troff and vgrind your pathalias! |
From: Ed R. <er...@pa...> - 2005-04-28 16:29:02
|
Check out this excerpt from the mon log on my test machine, running the currently tagged Mon 1.1-pre on NetBSD 2.0 (with my fix for dep_ok() as noted previously on this list): Apr 28 11:57:55 testhost mon[8003]: failure for staff-important-servers fancy_service 1114703875 bighost littlehost loghost Apr 28 11:57:55 testhost mon[8003]: calling alert hey.alert for staff-important-servers/fancy_service (/usr/local/mon/alert.d/hey.alert,eravin) bighost littlehost loghost Apr 28 11:57:55 testhost mon[8003]: calling alert mymail.alert for staff-important-servers/fancy_service (/usr/local/mon/alert.d/mymail.alert,er...@pa...) bighost littlehost loghost Apr 28 11:57:55 testhost mon[8003]: calling alert hey.alert for staff-important-servers/fancy_service (/usr/local/mon/alert.d/hey.alert,eravin) bighost littlehost loghost Note how the hey.alert gets called twice. Here's the mon.m4 definition of the periods in use for this service: period HEYDAY: wd {Sun-Sat} hr {6am-11am} alert hey.alert eravin upalert hey.alert eravin alertafter 2 numalerts 1 period HEYNIGHT: wd {Sun-Sat} hr {11am-6am} alert hey.alert eravin upalert hey.alert eravin alertafter 2 numalerts 1 period MAIL: wd {Sun-Sat} alert mymail.alert er...@pa... upalert mymail.alert er...@pa... alertafter 2 numalerts 1 I might expect the hey.alert to get called twice if this alarm was sent at exactly 11 AM. But note the time in the monlog messages above, it was well after 11 AM. Furthermore, I've tested this same config several times, and once again just now. Only the time above did I see two alerts generated instead of one. So it looks like there's a race condition of some kind. Note that the duplicated alerts, along with a non-duplicated alert, were sent at the same second - I suspect that is significant. Any ideas? |
From: Ed R. <er...@pa...> - 2005-04-26 20:37:44
|
I've been getting a lot of these lately in my Apache error log: Bareword "mailnbdy" not allowed while "strict subs" in use at (eval 138) line 1, <CF> chunk 104. Bareword "mailnbdy" not allowed while "strict subs" in use at (eval 138) line 1, <CF> chunk 104. Bareword "mailnbdy" not allowed while "strict subs" in use at (eval 138) line 1, <CF> chunk 104. I'm pretty sure this is from mon.cgi, who is use Mon::Client. The only mention of eval in Client.pm is at line 98 below: # grep -C5 -n eval /usr/local/lib/site_perl/Mon/Client.pm 93- 94- if ( (defined ($ENV{"USER"}) ) && ($ENV{"USER"} ne "") ){ 95- $self->{"USERNAME"} = $ENV{"USER"}; 96- } else { 97- if ($^O ne "MSWin32") { #Win32 doesn't have getpwuid :( *98: $self->{"USERNAME"} = eval( (getpwuid ($<))[0] ); 99- } 100- } 101- 102- $self->{"OPSTATUS"} = undef; 103- $self->{"DISABLED"} = undef; And since "mailnbdy" is the user Apache is running as, it looks like this code is the culprit. Oddly, this message doesn't turn up when I run it from the command line - only when invoked via inetd or in the Web environment. When invoked via inetd, the message goes away if I switch from Perl5.00502 to Perl5.6.1. Upgrading the rest of the system to Perl5.6.1 is something I want to do anyway, but I admit I'm perplexed as to why this message is turning up in the first place. Any ideas? |
From: Ed R. <er...@pa...> - 2005-04-24 07:23:13
|
On Thu, Apr 21, 2005 at 05:27:07PM -0400, Ed Ravin wrote: > I finally upgraded my production system to Mon 1.1 and Mon-Client-1.000. And I had to retreat shortly afterwards - alert dependencies weren't working. Alerts weren't being called because depstatus was always undefined. I tracked the problem down to this code in process_event: 3203 if ($sref->{"depend"} ne "" && 3204 $sref->{"dep_behavior"} eq "a") 3205 { 3206 dep_ok ($sref); 3207 } Ahem. Cough cough. Shouldn't line 3206 be: 3206 dep_ok ($sref, 'a'); It works a hell of a lot better now. I'm kicking myself for not catching this in my preproduction testing - none of the watch entries I tested with used dependencies. Who's using Mon-1.1-pre1? Why didn't they notice this? Jim, David, is this the right fix? Thanks, -- Ed |
From: Ed R. <er...@pa...> - 2005-04-21 21:28:47
|
I finally upgraded my production system to Mon 1.1 and Mon-Client-1.000. I ran into a new problem that eluded my previous testing - I use the "set" command to set a variable to a numeric value. My client programs that fetched that numeric value using the "get" command no longer recognized the value. It turned out if I "set" a value like this: 12345 It would be returned via "get" like this: '12345' Yes, with extra single quotes around it. Easy to code around, but annoying. It looks like this is happening due to changes in Mon::Client where "set" is implemented. It's also happening for string values that I set, which hasn't broken anything yet because I only use those vars for display. Is this related to the parsing changes in Mon 1.1? Can we fix it so that the single quotes aren't seen by the client programs? Also, if a client program wants to muck with some of the vars set by Mon, it won't be able to insert data that Mon will understand because of the extra quotes. |
From: Ed R. <er...@pa...> - 2005-04-12 14:33:26
|
On Mon, Apr 11, 2005 at 04:39:06PM +0100, Alex David Shadrach Hooper wrote: > Pulled from Ed Ravin's mail (Mon, Apr 11, 2005 at 11:21:59AM -0400): > > > > 229 > > 230 setlogsock('unix') > > 231 if grep /^ $^O $/xo, ("linux", "openbsd", "freebsd", "netbsd"); > > 232 > > 233 openlog ("mon", "cons,pid", $CF{"SYSLOG_FACILITY"}); > > > > And do something with the setlogsock() that's appropriate for Solaris > > and/or the way you have syslog set up there. > > > > Ah, yes. I should have tried all the possibles listed in the Sys::Syslog > manpage before posting; 'stream' seems to work nicely. Thanks very much. Sounds like we should add another couple of lines to Mon just before the openlog() call above: setlogsock('stream') if grep /^ $^O $/xo, ("solaris"); I've been wondering for a while if this should be a config file parameter. |
From: Ed R. <er...@pa...> - 2005-02-17 23:13:18
|
On Thu, Feb 17, 2005 at 03:54:25PM -0500, David Nolan wrote: > > --On Thursday, February 17, 2005 3:30 PM -0500 Ed Ravin <er...@pa...> > wrote: > >And the "ACK" command is still misbehaving, but I can see the problem > >is limited to my old mon.cgi - apparently the 'ack' field, instead of just > >being a 1 if the watch was ack'd, is now the ctime of the ack. > > > > Oh yeah, I forgot that had changed. That was done with the intent of > eventually displaying the timestamp of the ack, and having ack's timeout > after configurable periods of time. I also see that Mon now automatically adds the user's name to the ACK comment, making mon.cgi's doing this redundant (and rightly so). I found two spots in mon.cgi that test 'ack' == 1, changing them to 'ack' != 0 gets it to behave, though it still puts the user's name in twice. I think we need to bump up the Mon protocol version in 1.1 so mon.cgi will have a way of discovering what kind of server is on the other side. The other option is to make the user set something in mon.cgi.cf, which is unnecessary work for them (though more work for us :-). |
From: Ed R. <er...@pa...> - 2005-02-17 22:18:10
|
On Thu, Feb 17, 2005 at 03:54:25PM -0500, David Nolan wrote: > Ah, the wonder and curse of -w. It finds certain classes of bugs, and > actually helps you avoid exercising certain memory leaks in perl, but it > also whines about many things. I turned on -w in Mon to test a few things - here's a few fixes: @@ -156,7 +156,7 @@ # my $i; # loop iteration counter, used for debugging only my $lasttm; # the last time(2) the mon loop started -my $pid_file_owner; # set when creating pid file +my $pid_file_owner= 0; # set when creating pid file my $tm; # used in main loop # @@ -1064,7 +1064,7 @@ } elsif ($1 eq "dtlogging") { $new_CF{"DTLOGGING"} = 0; - if ($2 == 1 || $2 eq "yes" || $2 eq "true") { + if ($2 eq "yes" || $2 eq "true" || $2 == 1) { $new_CF{"DTLOGGING"} = 1; } @@ -4180,7 +4180,7 @@ my $trap_name = $1; my $trap_val = $2; chomp $trap_val; - $trap_val =~ s/^\'(.*)\'$/\1/; + $trap_val =~ s/^\'(.*)\'$/$1/; $trap{$trap_name} = un_esc_str ($trap_val); } -------------------------------------------------- And these things provoked "use of unitialized value" complaints, but I don't have any easy fixes: 3204 if ($sref->{"depend"} ne "" && 3205 $sref->{"dep_behavior"} eq "a") 5180 if (defined $sref->{"depend"} && $sref->{"dep_behavior"} eq $deptype) { 5181 $depend = $sref->{"depend"}; Though it might be too much work (without much point) to get mon to run under -w all the time, occasional checks like this might turn up problems that are waiting to happen. |