[Mon-commit] mon mon,1.4,1.5
Brought to you by:
trockij
From: David N. <vi...@us...> - 2004-06-14 11:30:01
|
Update of /cvsroot/mon/mon In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv13223 Modified Files: mon Log Message: Massive amounts of changes, imported from CMU Mon code. Many new features, not all of them are documented yet. Many bug fixes. Details below: Add "redistribute" as a config file option. This goes at the service level, not the period level. Its arguments are the same as an alert config option. (Script name plus optional arguments). Documented. Better support for Mon traps. Lots of little details weren't being handled quite right. And the parsing of the 'trap section' of auth.cf was completely broken Added syslog calls for all the possible reasons an alert could be suppressed, instead of just for 'alertevery'. Added support for a new auth type "trustlocal" which means that all client connections from localhost are trusted to identify themselves without a password. This is most useful when using any of the various Apache authentication mechanism to control access to the CGI scripts. Documented. Fixed a bug (or was it a feature?) which was causing testing intervals to back off very heavily when forced test runs were done. Added the mon client's username to the comment when ack'ing a failure. It turns out mon.cgi was already doing this, but I feel this should be the responsibility of the mon server, not the mon client. Changed upalert behavior to use the output of the successful monitor run, instead of the output of the last failure. We find this leads to less confusion, as seeing an UPALERT with failure information in it can be confusing. Fixed a bug where op_status wasn't being set to OK before calling the upalert, which only causes a problem when generating a upalert trap to another Mon server, as the trap alert script would grab the contents of the MON_OPSTATUS environment variable, which would still be STAT_FAIL, so the first OK trap to the remote Mon server would actually still be a failure trap, causing failures to appear longer on the other server then they actually were. Added full support for saving/loading full opstatus information. Added support for specifying which type(s) of state to load when mon is started with the -l switch. Documented. Added new dependency behavior type 'hm', for per-host monitor suppression. Documented. Added the ability to have multiple dependency expressions associated with a single watch/service. This added three new mon.cfg keywords 'alertdepend', 'monitordepend', and 'hostdepend'. Documented. Fixed some bugs with trap authentication checking where traps from any host were being allowed. Fixed a couple bugs that was preventing traptimeouts from sending alerts when there was a dependency involved, or an alertafter statement. *Lots* of little changes to make 'perl -w' happy with mon. As a side effect of this, the memory leak problems I was having seem to have gone away. Added code to track what host a trap comes from. Fixed a couple bugs where things weren't getting reset after an up trap. Added support for remote mon updates via the monremote config option. Documented. Added non-alert alerts (i.e. logging only alerts that you don't want to cause the service to go red in mon.cgi). Not yet documented. Changed syslog log levels for lots of things to lower levels. Fixed a bug that was allowing bogus options in periods. Started writing support for ackalert/disablealert, not yet complete/tested/documented. Added unack_summary support, which causes an acked failure to un-ack if the summary changes. alert timers are reset when an ack is removed. Not yet documented. Added code to export group/service to monitor scripts, and to export ack messages to alerts. Added some debugging for tracking down issues with unack_summary behavior. Don't alert on disabled groups, in addition to not calling monitor scripts. (Traps were still alerting) Eliminated usage of parse_line during trap processing, to avoid a perl regexp segfault. Stop calling waitpid for alerts, let the waitpid elsewhere handle it. Start setting _exitval when a trap is received. Changed behavior of disabled groups, to still monitor, but not alert. Fixed a bug which caused alerts to be sent on traps, even when the scheduler was stopped. Added support for dependency memory, with two new config file statements 'dep_memory = timeval' in both the global and per service config blocks. Also added initial support for storing a timestamp on acks. added support for 'ignore_summary' flag on alertevery statements. Wrapped a fork around the alert sending routine, to avoid hangs. Disabling a host which is the only host in its group now disables the whole watchgroup instead. (So the monitoring will still happen, so you can see the current state, even though it won't alert.) Better error reporting on dependency error. Added 'alertexitrange' config option, to apply an exit range filter to all alerts in this period Don't treat disabled watches & services special in dependencies. Just look at their status. (Though I'm tempted to treat them special in the opposite way, i.e. always fail the dependency check.) Index: mon =================================================================== RCS file: /cvsroot/mon/mon/mon,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** mon 11 Jun 2004 17:03:47 -0000 1.4 --- mon 14 Jun 2004 11:29:47 -0000 1.5 *************** *** 71,74 **** --- 71,75 ---- sub debug_dir; sub dep_ok; + sub dep_summary; sub depend; sub dhmstos; *************** *** 87,90 **** --- 88,92 ---- sub handle_trap_timeout; [...2773 lines suppressed...] } } + + # Perl's "system" function blocks. We don't want the mon process to + # ever block. So we fork then call system. Mon will handle the + # child process cleanup elsewhere. + sub mysystem { + my @args = @_; + my $pid; + print STDERR "mysystem called: @args\n"; + if ($pid = fork()) { ## parent + return; + } elsif (defined($pid)) { ## child + system(@args); + exit(0) + } else { ## parent - fork failed + print STDERR "You lose!\n"; + } + print STDERR "mysystem returning\n"; + }; |