mon-commit Mailing List for mon (Page 12)
Brought to you by:
trockij
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(36) |
Jul
(21) |
Aug
(9) |
Sep
(1) |
Oct
(2) |
Nov
(12) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(4) |
Feb
(10) |
Mar
(5) |
Apr
(22) |
May
(17) |
Jun
(3) |
Jul
(4) |
Aug
(10) |
Sep
(2) |
Oct
(1) |
Nov
(2) |
Dec
(2) |
2006 |
Jan
|
Feb
(4) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
(6) |
Oct
|
Nov
|
Dec
(2) |
2007 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
(22) |
Jun
(19) |
Jul
(7) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(6) |
2008 |
Jan
(1) |
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
(4) |
Sep
|
Oct
(7) |
Nov
(1) |
Dec
|
2009 |
Jan
(2) |
Feb
(9) |
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(5) |
2010 |
Jan
(2) |
Feb
(1) |
Mar
|
Apr
|
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
(2) |
Apr
(1) |
May
(2) |
Jun
(2) |
Jul
(65) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: David N. <vi...@us...> - 2004-11-15 14:45:30
|
Update of /cvsroot/mon/mon/doc In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv9218/doc Modified Files: README.cgi-bin mon.8 Added Files: CHANGES.mon.cgi README.mon.cgi README.snmpvar.monitor README.syslog.monitor Log Message: Pulling lots of changes from the 1.0.0pre* branch into the HEAD, to prepare to tag mon-1.1pre1 --- NEW FILE: README.snmpvar.monitor --- snmpvar.monitor by P.Holzleitner What does it do? snmpvar.monitor is a plug-in for the "mon" systems monitoring package written by Jim Trockij (http://www.kernel.org/software/mon). Called by mon, it queries freely configurable values using SNMP, compares them against specified limits and reports any violation. Some parameters that can be monitored (just to give you an idea): Equipment operational status (temperature, fan rotation) UPS Status (line power / battery, minimum line voltage, load % ...) Switch/Router status (interface up, BGP session up, ...) Server status (redundant power supply OK, disk array OK, ...) Status of services (process running, mail queue length, ...) License GNU GPLv2 (http://www.fsf.org/licenses/gpl.txt) - See file COPYING Quick Start: * Make sure you have UCD SNMP 3.6.2+ (libraries) and the Perl SNMP module installed (http://www.cpan.org/misc/cpan-faq.html) * Copy snmpvar.mon to your mon.d directory * Copy snmpvar.def to /etc/mon, add your own variables * Copy snmpvar.cf to /etc/mon and edit to match your needs * Test from mon.d directory with ./snmpvar.monitor -l host1 host2 ... * Test again from mon.d directory with ./snmpvar.monitor host1 host2 ... * Add watch/service to mon.cf, using snmpvar.monitor Commandline options: --varconf=/path/to/snmpvar.def if neither /etc/mon nor /usr/lib/mon/etc --config=/path/to/snmpvar.cf if neither /etc/mon nor /usr/lib/mon/etc --community=your_SNMP_read_community if not 'public' --groups=Power,Disks test only a subset of variables for a host group --timeout=n SNMP GET timeout in seconds --retries=n number of times to retry the SNMP GET --debug tell what config is being useed --mibs='mib1:mib2:mibn' load specified MIBs --list[=linesperpage]] produce human-readable listing, not alarms For every host name passed on the command line, snmpval.monitor looks up the list of variables and corresponding limits in the configuration file (snmpmon.cf). If a --groups option is present, only those variables are checked which are in one of the specified groups. To specify more than one group, separate group names with commas. You can also exclude groups by prefixing the group name(s) with '-'. Don't mix in- and exclusion. Examples: --groups=Power only vars in the Power group --groups=Power,Env vars in the Power or Env group --groups=-Power,-Env all vars except those in Power or Env groups --groups=Power,-Env won't work (only the exclusions) For every such variable, it looks up the OID, description etc. from the variable definition file (snmpvar.def). This monitor looks for configuration files in the current directory, in /etc/mon and /usr/lib/mon/etc. Command line option --varconf overrides the location of the variable definition file, option --config sets the configuration file name. When invoked with the --list option, the output format is changed into a more human-readable form used to check and troubleshoot the configuration. This option must not be used from within MON. Exit values: 0 if everything is OK 1 if any observed value is outside the specified interval 2 in case of an SNMP error (e.g. no response from host) Basic Troubleshooting: use snmpvar.monitor --list option to see variable values use snmpwalk your_hostname public .1 | less to verify SNMP agent The snmpvar.def File: In this file we define variables that can be retrieved via SNMP. In a way, the .def file is snmpvar.monitor's idea of a MIB. Entries consist of a "Variable variable-name" declaration Variable PE4300_TEMP_MB [NOTE: The variable name cannot be "Host" or "FriendlyName"] followed by the mandatory specification of Object ID and Description: OID .1.3.6.1.4.1.674.10891.300.1.5.2.2.1.3 Description Motherboard Temperature It is suggested that OIDs be entered numerically as shown above in order to eliminate the need for having the SNMP libraries compile the relevant MIB files on every invocation of the monitor. By default, this monitor loads no MIBs. If you want to use symbolic OIDs, use the --mibs commandline option to specify which MIBs you need. By the author's convention, an OID describing an array of values, like ifOperStat which takes the interface number as an index, is written with a trailing dot, while OIDs of scalars end in a number. As of version 1.1.1, the monitor will insert the dot before the index if you forgot it in the .def file. Optional Elements of a Variable definition: DefaultIndex 3 4 5 A list of indices to test by default. Let's say the OID is .1.2.3. and DefaultIndex is "18 22 36", then the monitor will retrieve the values of .1.2.3.18, .1.2.3.22 and .1.2.3.36 when testing this variable, and will compare them all against the limits. Where necessary, the DefaultIndex can be overridden for one host/variable combination, using the Index statement in the .cf file. FriendlyName 3 Disk Fan 1 This lets you replace the standard display of "Variable [Index]", e.g. "Fan Speed [5]", with individual labels for each index. The FriendlyName option is typically specified in the .def file for items that have the same name for every use, e.g. component names like in the case of fans, power supplies etc. The same option exists in the .cf file to name a particular variable on a particular host, e.g. to display a line name instead of an interface number on a router. If the FriendlyName string begins with "@", the Description is substituted for the "@". Scale / 10.0 A formula to re-scale the value returned from the host. The expression is appended to the raw value and the resulting expression is evaluated by Perl. The raw value is available as $rawval if necessary. Unit C Used in value display / messages, Decode 1 unknown Decode 2 OK Decode 3 FAILURE Values retrieved through SNMP are often enumerations of status codes. The Decode statement lets you put text labels on these values. DefaultGroup Environment Defines that all, by default, instances of this variable go into the specified group. Individual overrides possible in .cf file. DefaultMin 300 DefaultMax 2000 DefaultEQ 1000 DefaultNEQ 1000 Default alarm limits. See description of Min/Max/EQ/NEQ below. The snmpvar.cf File: In here, you "call up" the variables to be retrieved for a particular host. Entries consist of a "Host host-name" declaration followed by at least one "variable-name [options ...]" line. Host ntserv1 This hostname corresponds to the hostname on the command line, i.e. the hostname you used in MON's hostgroup statement. FOO_FAN_RPM Min 1000 Max 5000 MaxValid 10000 Index 1 2 3 4 This example uses almost all options. It instructs the monitor to retrieve the OID specified under "FOO_FAN_RPM" in the .def file. Min 300 specifies a minimum value, measured >= minimum Max 2000 specifies a maximum value, measured <= maximum EQ 1000 specifies a exact value, measured == maximum NEQ 1000 specifies a exact value, measured != maximum If the measured value is outside of these limits, a failure is reported. To test for "Value = X", use "Min X Max X". MinValid -1 MaxValid 10000 Some monitoring hardware occasionally measures garbage. To avoid triggering an alarm when this happens, you can use MinValid/MaxValid to specify the range (inclusive) of plausible values for this variable. If the measured value exceeds these limits, only a warning will be generated, but no failure will be reported to MON. Group Environment Puts this particular variable into the specified group. Groups are used to test a partial set of the variables specified for a host, by using the --groups= command line option. Index 1 2 3 This tells the monitor which object instances (array elements) to test in case of a non-scalar object. Since the list of indices can be as long as necessary, the Index option must be the last one on the line (after Min X, Max Y etc.) The list specified as DefaultIndex in the .def file entry for this variable is used unless Index is pecified here. When retrieving a non-scalar value, the snmpvar.monitor will normally display the instances (array elements) by appending their index to the description, as in "Line Status [3]". Often, it is desirable to label individual instances in a more mnemonic way. To do this, you can add a number of FriendlyName directives after a variable request, like this: Host firewall IF_OPERSTAT Index 1 2 3 FriendlyName 1 1: Leased Line FriendlyName 2 2: DMZ FriendlyName 3 3: Internal Router In this case, the monitor checks the ifOperStat for interfaces 1, 2, and 3 on host "firewall". If interface 3 were not "up", the monitor would signal a failure of "Internal Router" instead of "ifOperStat [3]". If the FriendlyName string begins with "@", the Description is substituted for the "@". If all instances of this variable having the same index have the same meaning regardless of what host they are on, you can put the FriendlyName statement into te respective variable definition in the .def file instead. The snmpopt.cf File: This optional file is used to pass parameters to the SNMP library. For SNMPv1, this is generally not necessary unless the target's SNMP port differs from the default (161). Note that SNMPv1 community string, timeout and retries can also be specified on the snmpvar.monitor command line, overriding whatever default or configuration file setting. You will need to edit this file in order to use SNMPv3. Index: README.cgi-bin =================================================================== RCS file: /cvsroot/mon/mon/doc/README.cgi-bin,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -C2 -d -r1.1.1.1 -r1.2 *** README.cgi-bin 9 Jun 2004 05:18:06 -0000 1.1.1.1 --- README.cgi-bin 15 Nov 2004 14:45:18 -0000 1.2 *************** *** 3,15 **** mon.cgi ------- ! mon.cgi used to be a part of the mon distribution, but it is now ! maintained by Andrew Ryan <an...@na...>. The latest ! release can be found at ! ! http://www.nam-shub.com/files/ ! ! or ! ! ftp://ftp.kernel.org/pub/software/admin/mon/contrib/ minotaur --- 3,8 ---- mon.cgi ------- ! mon.cgi is the more advanced web interface to mon, maintained by ! Andrew Ryan <an...@na...>. minotaur --- NEW FILE: README.syslog.monitor --- Readme file for syslog.monitor $Id: README.syslog.monitor,v 1.2 2004/11/15 14:45:18 vitroth Exp $ (Note: This Readme file is an insult to the reader. Better documentation will come as soon as I find more time and fix some more bugs) INTRODUCTION This is a syslog for mon (http://www.kernel.org/software/mon/) by Jim Trocki. It is different from the other monitors, because it is constantly running and communicates with the mon server via Mon::Client over the network, instead of running under mon's supervision. It listens for syslog packets comeing in from the network, parse them, checks them against a rule set and reports to the mon server if necessary. REQUIREMENTS You need to have the following non-std Perl modules installed: Time::HiRes Mon::Client DETAILS syslog.monitor accepts a single command line parameter, the name of the configuration file. All options are explained inside the configuration file, see syslog.conf as an example. At startup, the daemon retrieves a list of all watches from the mon server for which a service "syslog" is defined. We also read the hostgroup definition for this watch from the mon server. (The hostnames are resolved and the result is used to check if the incoming syslog packet is accepted and which host it came from, so you should make sure your hostnames resolve to all IPs from which your systems might send a syslog packet - on a Cisco, you might want to consider "logging source-interface") This basically amounts to: For every hostgroup you want syslog.monitor to accept and monitor syslog packets, define a syslog service. This watch/service is where we later send our traps. For those hosts, add a line like *.* @syslog.monitor.host.name to /etc/syslog.conf. Configure syslog.monitor by editing syslog.conf and following the comments therein. Start syslog.monitor. Restart mon. killall -HUP syslogd on the hosts you want to monitor. Read the logfiles and fix the problems. ;-) AUTHOR Please don't bother Jim with questions relating to this. If this should lead to global warming, code freeze or Elvis's revival, I accept absolutely no responsibility. However, I will gladly receive and incoporate bugfixes and sensible bug reports. Lars Marowsky-Brée <la...@ma...> URL It appears we have made our way to ftp://ftp.kernel.org/pub/software/mon/contrib/ - please use a mirror, as described on http://www.kernel.org/. --- NEW FILE: CHANGES.mon.cgi --- mon.cgi v1.52 21-May-2001 ------------------------- + added check for sufficient Mon::Client version + added optional "watch" keyword to config file that allows users to see only the groups they are configured to be allowed to see, by regex. + added optional keyword "show_watch_strict" that, when set to "yes", will enforce watch keywords strictly, and not allow the mon.cgi user to see any detail about any other hostgroup. + query_groups added summary/ack information to failed services + query_groups: now prints red or yellow as appropriate, instead of just red, for failed services. + added "log in" link to mon.cgi base page + moncgi_get_params: Fixed bug with bug with null values of $monhost and $monport getting through. + fixed moncgi_reset bug - keepstate & no-keepstate are reversed + moncgi_authform: passwd dialog s cleared after unsuccessful password entry. + new function: moncgi_login - allow user to log in prior to having to execute a privileged action. + new config parameter: logo_link. logo_link is a URI that will be linked to the logo picture, if logo is defined. + New function: can_show_group(groupname), to test if a group can be shown according to the "watch" directives. + The following functions were updated to reflect the new watch keyword access control routines : list_alerthist, list_dtlog, query_group, list_disabled, svc_details, mon_test_service, moncgi_test_all, mon_enable, mon_disable, mon_ack + fixed numerous warnings, did some code cleanup and improved comments. + Fixed another mod_perl bug in monhost/monport parsing + Updated moncgi-appsecret.pl, in the util directory, to reflect new code. mon.cgi v1.51 22-Mar-2001 ------------------------- + Fixed taint-checking problem with monhost and monport args (Mon::Client was complaining under TaintMode/-T). mon.cgi v1.50 15-Mar-2001 ------------------------- + Config file parsing support was not working properly. This has been fixed, and a new subroutine was introduced: initialize_config_globals. mon.cgi v1.49 14-Mar-2001 ------------------------- + Add test_config option on main menu bar (new 0.38.21 command) + change reset to single button, with follow-up page, giving two choices -- reset keepstate and reset. + new function - moncgi_reset to allow users to choose which type of reset they would like to execute. + Patch from Ed Ravin (er...@pa...) to accomodate a site-specific custom toolbar row and site-specific menu commands. + added a optional config file that lets users specify their own mon.cgi parameters. + added TVA color scheme to the distro (from tb...@tv...) + Use HTML::Entities to escape HTML submitted as ack messages, avoiding cross-site scripting attacks/javascript and ensure proper encoding of characters entered as ack messages. HTML scrubbing can be skipped by setting the variable untaint_ack_msgs to "no". + remove all <pre>'s and replace with <font face="$fixed_font_face">. Important messages were often getting cut off the screen by the use of <pre>. + make $monhost and $monport optional CGI params as 'h' and 'p' respectively + added "test service" and "test-all" to query_group page mon.cgi v1.48 01-Dec-2000 ------------------------- + Have ability to do mass disabling/enabling of hosts and services in hostgroup. + query_group: have radio button for enabled/disabled status (facilitates mass en/disabling) + query_group: added a table on to show services for that group, enabled/disabled with radio button. + query_group: now includes service status on this page + query_group: mass dis/enabling of svcs requires a new function, mon_state_change + svc_details: widened the table + main: Command matching changed to use exact matches instead of regex matches (duh). + main: fix bug with Revision tag in $VERSION + list_disabled: Also added mass disabling + mon_state_change_enable_only: new function to support list_disabled mass re-enabling. + list_pids: cleaned up function and formatting + added mon_state_change function for mass state changing + added mon_list_opstatus function + query_opstatus: moved legend to below main table + query_opstatus: changed legend to use bgcolor instead of font color + query_opstatus: ack message is now included in summary + query_opstatus: increased main table width to 100% + query_opstatus: can now test svcs from this page + ability to do multiple tests at the same time for a single hostgroup + moncgi_test_all: new function to test all svcs in group + Ran mon.cgi through 'tidy' (http://www.w3.org/People/Raggett/tidy/) for improved HTML compliance. Most common pages are OK now (I think) except for table summary attributes. I'll get to them eventually. + added last_ok time for failed services in "Last Check" column + color of UNCHECKED services is now midnight blue by default, unchecked services are now readable in the default color scheme! mon.cgi v1.46 20-Aug 2000 ------------------------- + Fixed bug in list_dtlog that would show min and max failure time as "-1" seconds if no failures had been seen on that service. Also the table is now not printed at all instead of being a 0-row table. + Made it easier for users to get themselves out of the situation where they enter in a valid username and an invalid password. + Made the summary info MUCH easier to see when a service is in the failure state. + alert_details is now "svc_details", a much more descriptive name, since it shows success as well as failure details. + svc_details [nee alert_details] got a little bit of a cleanup (not much). + list_dtlog now has a configurable maximum number of entries per page that it will display, defaults at 100. Large downtime logs would not render well in most browsers, and would not render at all with Netscape's table drawing algorithm. + Added optional $monport argument, in case you don't run mon on port 2583. + Trap watches are now correctly handled and printed (thanks to Ed Ravin <er...@pa...> for the bug report and fix). + Fixed bug in pp_sec that would cause "1 days" to be printed out instead of "1 day". mon.cgi v1.45 05-Jun 2000 ------------------------- + query_opstatus: Built an "amber level" alert for services that have failed but never issued an alert + query_opstatus: Changed "Last Checked" and "Est. Next Check" times to be deltas instead of absolute times, both relative to servertime and not localtime. + Added ACK (and re-ack) feature + query_opstatus: Added additional visual warnings if scheduler is not running or cannot be contacted. + Changed default app secret + Button bar at top of each page is cleaner + Fixed bug with scheduler falsely claiming to be stopped if you try to stop the scheduler and aren't authenticated, or if the server is not running. + Fixed bug where multiple auth failures are displayed if a user is not authenticated (should only notify once) + Made it easier to not hit "reset server" button accidentally + Made font on ONDS check times size -1 + Show the downtime log as an option on query_group + Fixed "test immediately" stuff so it tests and then shows right status + list_opstatus: hostgroup column no longer goes white if svc is unchecked + alert_details is MUCH spiffier + alert_details now checks to see if a monitor for that service/group is currently running, and as such, the status reported is subject to change very soon. + Added more decriptive text to service status table in alert_details alert_details. + Changed default return screen on enable_service to be alert_details if that's where the user last came from. + Added new 0.38-18 data types for alert_details + list_dtlog: Display median in addition to mean failure time to lessen effects of downtime outliers. + Added a Refresh button on alert_details page + Cleaned up the list_disabled function + Got rid of backwards() function, unused relic from old mon.cgi + Fixed the META REFRESH tags so that it works on all browsers (put it in the header where it belongs) and handles more cases (alert_details, test_service) + Started using servertime in places instead of time on local web server + Visual enhancements for this version submitted by Brian Doherty <bdo...@ma...> + Fixed a bug in the "failure-free operation %" calculation if you had an extremely large number of failures in a time period, % could show up as negative. mon.cgi v1.38 18-Feb 2000 ------------------------- + MAJOR speedup, only use one Mon connection per page view. Pages typically load 2-3x faster. + list_opstatus in Summary mode is now more brief. All "OK, Non-Disabled Services" (ONDS) for any given hostgroup are now aggregated in a single line. If you monitor a lot of services on each of your host groups, this will save you a lot of screen real estate. Services which are disabled and/or failing are still broken out individually. + added FAILED flag to Status box , moved DISABLED flag, so mon.cgi works with Lynx & w3m or any other text browser that supports tables (only Lynx and w3m tested, looks great with w3m by the way). + changed default path of cookie to "/" to avoid lynx complaining about "invalid cookie path". + changed alert_details to use a table, include "view downtime log" + on query_group page, turn box gray if host is disabled. + fixed a div0 bug if you have no entries in your dtlog and ask to view it + changed disabled host in query_group to sort alpha even when hosts are disabled. + alert_details function now auto-detects failure/success, doesn't need to be told which one to look for ("test service immediately" would show inconsistent results from this behavior, since it is impossible to know the results of a test before you run it!) mon.cgi v.1.35 -------------- + Downtime log viewing/querying support. + Disabled services/hosts/watches now appear as gray-colored boxes on the main display screen. This makes it easier to see what is disabled. + Fixed loadstate and savestate bugs again. These commands now work. + I finally have sort of a release process, so hopefully my releases will not be littered with formatting code that is specific to my environment, and they will run fine out of the box when you get them. + Fixed a few routines to work with changing ways Mon::Client asks you to do things. + Also, if you are logged in as an authenticated user (not the "default user", if one is defined), your username will appear on each page, so you always know who you are authenticated as. + Added a logout button. + Added ability to do "reset keepstate" as well as "reset" from the web interface. + The command bar is now 2 lines instead of one. Even on my 21" monitor, 13 buttons was too much to have on 1 line (let alone my poor 800x600 laptop LCD!). + Mon::Client::test is broken in v0.7. To make it work in the way that mon.cgi expects it to, change line 1470 in Client.pm v0.7 from: > if ($what !~ /^alert|startupalert|upalert$/) { to < if ($what !~ /^monitor|alert|startupalert|upalert$/) { mon.cgi 1.32.1.2 01-Feb 2000 ---------------------------- + Fixed loadstate and savestate to not be NOOPs. + Established a "default" user for when authentication was required but you don't want to make users log in just to list status. + Along with the default user, there is also now a "switch user" feature that offers the user the chance to re-authenticate to a user of higher privilege if they are denied the running of a command due to a lack of authorization. + Fixed HTML bugs with hardcoded colors in font and table tags scattered throughout code (patch courtesy of Martha H Greenberg <marthag@MIT.EDU>, thanks!). This makes it possible to run mon.cgi in colors other than the default scheme. mon.cgi users take note however, testing color schemes is not part of my QA process (such as it is) and so if you find something broken, let me know and I'll fix it. Index: mon.8 =================================================================== RCS file: /cvsroot/mon/mon/doc/mon.8,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** mon.8 14 Jun 2004 11:08:21 -0000 1.2 --- mon.8 15 Nov 2004 14:45:18 -0000 1.3 *************** *** 329,332 **** --- 329,339 ---- global configuration variable. + .TP + .B MON_CFBASEDIR + The directory where configuration files should be kept, + as indicated by the + .I cfbasedir + global configuration variable. + .P "fping.monitor" should return an exit status of 0 if it *************** *** 371,375 **** unless .B no_comp_alerts ! is defined in the period section. If an alert was already sent within the last .B alertevery --- 378,383 ---- unless .B no_comp_alerts ! is defined in the period section. An upalert will only be sent ! if the previous state is a failure. If an alert was already sent within the last .B alertevery *************** *** 377,381 **** .I unless the summary output from the current monitor program differs from the last ! monitor process. Otherwise, send an alert using each alert program listed for that period. The .B "observe_detail" --- 385,390 ---- .I unless the summary output from the current monitor program differs from the last ! monitor process. ! Otherwise, send an alert using each alert program listed for that period. The .B "observe_detail" *************** *** 390,393 **** --- 399,409 ---- The reasoning is that if the summary output changes, then a significant event occurred and the user should be alerted. + The "strict" argument to alertevery will suppress both + comparing the output from the previous monitor run to the current + and prevent a successful return value of the monitor from + resetting the alertevery timer. For example, "alertevery 24h strict" + will only send out an alert once every 24 hours, regardless of + whether the monitor output changes, or if the service stops and then + starts failing. .SH ALERT\ PROGRAMS *************** *** 511,514 **** --- 527,537 ---- global configuration variable. + .TP + .B MON_CFBASEDIR + The directory where configuration files should be kept, + as indicated by the + .I cfbasedir + global configuration variable. + .P The first line from standard input must be used as a brief summary *************** *** 684,691 **** .TP - .BI "snmpport = " portnum - Set the SNMP port that the server binds to. - - .TP .BI "serverbind = " addr --- 707,710 ---- *************** *** 702,709 **** .TP - .BI "snmp =" {yes|no} - Turn on/off SNMP support (currently unimplemented). - - .TP .BI "dtlogfile = " file .I file --- 721,724 ---- *************** *** 949,952 **** --- 964,969 ---- .B service followed by a word which is the tag for this service. + This word must be unique among all services defined for the + same watch group. The components of a service are an interval, monitor, and *************** *** 1187,1191 **** The .B period ! keyword has two forms. The first takes an argument which is a period specification from Patrick Ryan's --- 1204,1208 ---- The .B period ! definition has two forms. The first takes an argument which is a period specification from Patrick Ryan's *************** *** 1207,1212 **** parameters. .TP ! .BI alertevery " timeval [observe_detail]" The .B alertevery --- 1224,1235 ---- parameters. + Period definitions, in either the first or second form, must be unique within + each service definition. For example, if you need to define two + periods both for "wd {Sun-Sat}", then one or both of the period definitions + must specify a label such as "period t1: wd {Sun-Sat}" and + "period t2: wd {Sun-Sat}". + .TP ! .BI alertevery " timeval [observe_detail | strict]" The .B alertevery *************** *** 1230,1234 **** "observe_detail" is the last argument, then both the summary and detail output lines will be considered when comparing the ! output of successive failures. Please refer to the .B "ALERT DECISION LOGIC" section for a detailed explanation of how alerts are suppressed. --- 1253,1262 ---- "observe_detail" is the last argument, then both the summary and detail output lines will be considered when comparing the ! output of successive failures. ! If the string "strict" is the last argument, then the output ! of the monitor or the state change of the service will have ! no effect on when alerts are sent. That is, "alertevery 24h strict" ! will send only one alert every 24 hours, no matter what. ! Please refer to the .B "ALERT DECISION LOGIC" section for a detailed explanation of how alerts are suppressed. --- NEW FILE: README.mon.cgi --- Introduction to mon.cgi -------------------------------------------------------- This interface, along with mon itself, is available from ftp://ftp.kernel.org/pub/software/admin/mon/ Development versions of mon.cgi can be found at http://www.nam-shub.com/files/ -------------------------------------------------------- mon.cgi is a web-based GUI for mon. Its purpose is twofold: 1) To provide an easy-to-read visual display of all the status items that mon keeps track of, and 2) To provide an easy-to-use web administration interface to allow users to perform all mon administration tasks from any web browser. This package and the documentation assumes that you have at least a basic familiarity with mon. ----------------------------------------------------------------- mon.cgi v.1.52 21-May-2001 by Andrew Ryan <an...@na...> This interface, along with mon itself, is available from ftp://ftp.kernel.org/pub/software/admin/mon/ Development versions of mon.cgi can be found at http://www.nam-shub.com/files/ ----------------------------------------------------------------- This is the latest stable version of mon.cgi, meant to be used only with mon 0.38-21 and above, and a version of Mon::Client that is 0.11 or higher. The chief reason that you will need the new version is for the "test config" functionality. This release has 4 new features of note: 1) Access control. Using the 'watch' keyword in the config file, you can restrict access to a particular configuration on a per-hostgroup basis. 'watch' keywords can be regular expressions. Original idea and keyword name stolen from monshow :) 2) 'watch' keywords can either be implemented "softly" -- by default only certain hostgroups are shown, but all can be accessed -- or "strictly" -- only the hostgroups explicitly allowed by 'watch' keywords can be accessed in any way. Using strict access control, an organization using mon to watch systems belonging to multiple customers to be able to segregate those different customers' monitoring completely. 3) There's now a login button. The people have spoken! 4) mon.cgi now checks for the proper version of Mon::Client before it starts. This was a major support problem. Plus many other bug fixes and small improvements, as usual. This release should be considered stable until proven otherwise :) Please see the CHANGES file for more information about this release. Thanks to all who report bugs, submit patches, and give feedback. Andrew Ryan <an...@na...> Installing mon.cgi ------------------ Instructions for installing mon.cgi are located in the header of the mon.cgi file itself. Roughly speaking, the order of events is: 1) Install mon and get it working, set up monpasswd and auth.cf files and get them verifiably working if you're using mon.cgi authentication (hint: you should be!). 2) Install a web server, preferably Apache, and preferably with mod_perl built in. Start the web server and verify that it works. 3) Put mon.cgi in your cgi-bin directory and make sure it is executable by the apache user (make it 0755 or 0555). 4) Edit your mon.cgi file to change default values to match your environment (e.g. contact email, your company logo, your company name, etc.). 5) If you're requiring users to log in (highly recommended), you must change the default app secret variable $app_secret in your copy of mon.cgi, and install the Crypt::TripleDES module from CPAN on the machine which will be running mon.cgi. 6) If you want to easily customize the look and feel of mon.cgi, as well as various other configuration options, copy the sample mon.cgi.cf file (in the /config directory of this distribution) into a location where your webserver can read it, and edit the line beginning '$moncgi_config_file = ""' to reflect the path to your config file. You can then change the look and feel of mon.cgi, as well as implement access controls, directly from this file. mon.cgi Design Goals -------------------- 1) Provide 100% of the functionality of mon in a graphical user interface. Ideally, there will be some things that the GUI is better for, and inevitably, some things that the command line will always win out for. 2) Maintain 100% compatibility with mon and Mon::Client. If a patch to mon or Mon::Client is required to get a piece of mon.cgi functionality working, we write it, submit it, and get it folded in to the main distribution before making it official in mon.cgi. 3) Expose mon to the largest number of people possible in the most useful way. It is the author's belief that mon is a very useful piece of monitoring software, and it is also my belief that the best way to insure the growth and support of this software is to expose it to a large number of people in your organization in a way that will cause them to reach the same conclusion. A web client is the most universal way to achieve this goal at the present time, as a web client can be run on any network that mon would be. 4) Simplicity and lightness. In other words: Compatibility on a large number of client browser sizes, versions, and resolutions; No frames! ; Adhering to as many of the standard good usability conventions as possible ; Keeping mon.cgi all one file, with a very short setup time ; No special modules required past those needed to run mon, and optional additional modules kept to a minimum ; 100% text browser compatibility ; Performance and speed ; Low resource utilization. Sometimes these design goals work against one another, but hopefully we come out ahead when tradeoffs are made. Alternatives to mon.cgi ----------------------- If you don't like mon.cgi but you would still like a web GUI, you have 2 alternatives. Your first alternative is Jim's monshow, which ships with mon in the clients/ subdirectory of the mon distribution. The second alternative is Gilles Lamiral's Minotaure, which can be found at ftp://ftp.kernel.org/pub/software/admin/mon/contrib/. Both of these are fully functional and may suit your needs better than mon.cgi. You are encouraged to take a look at them both and decide which is best for you. SITE CUSTOMIZATION ------------------ mon.cgi has always been "customizable," in that the source was available and you were encouraged to substitute your own parameters (e.g., mon host, mon port, company logo, etc.). But this meant that with each new version, you had to go back and re-edit the source code. Not a big deal, but still something of a pain. As of v1.49, mon.cgi includes some features which are meant to facilitate these changes and make site-specific customizations easier to perform, especially as mon and mon.cgi continue to evolve. Creating Your Own Config File ----------------------------- Previous to v.1.49 of mon.cgi, you could customize the look of the page, but all customizations had to be done in the source itself. This has numerous disadvantages, so 1.49 introduces an *optional* config file which will be read only as necessary and will allow you to specify custom values for parameters without having to touch the source code each time. You can still edit the source each time if you want, but if you want to set up a config file, follow these steps: 1) Copy the config file (included with the mon.cgi distribution) config/mon.cgi.cf to a location of your choice. It's best to start with a sample config file, because the config file format is very simple, and it will give you a chance to see how it works and experiment with parameters. 2) Edit the mon.cgi source code to find the line that specifies the variable "$moncgi_config_file". Change the value to the filesystem path of your copy of your mon.cgi config file. 3) Now you can edit the config file and make changes at will. Every time you change the mtime of the file (e.g., by saving it in a text editor, or touch'ing the file), mon.cgi will re-read the config file and the changes will take effect. If there are errors in parsing the config file, they will go to STDERR, which in most setups will end up in your web server's error log. Look in the errors file if your config isn't working like you expect it to work. Adding A New Row And Custom Commands To The Command Button Bar -------------------------------------------------------------- Adding a new row to the command button bar, with corresponding custom commands, is quite a bit more involved than the relatively simple matter of changing a config file. If you've developed, or are interested in developing your own custom commands, however, this functionality might be just what you needed. In the following example, we add a command called "ack_all" to the button bar, and also add the routine to do the ack'ing. The actual guts of the ack_all routine aren't included, but the goal of these instructions is to give you enough to start off. The first step is to create your own moncgi_custom_print_bar function. A stub function exists in the mon.cgi code, and the below code shows you how you would put in your own function that has one button, labeled "Acknowledge All Failures". Sample moncgi_custom_print_bar subroutine: sub moncgi_custom_print_bar { # # This is a sample routine, which adds a third row to the # command table, with one command: "Acknowledge All Failures" # my ($face)= (@_); $webpage->print("<tr>\n"); $webpage->print("\t<td colspan=7 align=center><font FACE=\"$face\"><a href=$url?${monhost_and_port_args}command=ack_all>Acknowledge All Failures</a></font></td>\n"); $webpage->print("</tr>\n"); } The next step is to tell mon.cgi that you are using your own custom commands, by creating your own moncgi_custom_commands subroutine. Again, there is a sample function in the mon.cgi code which you can replace with your own. Sample moncgi_custom_commands subroutine: sub moncgi_custom_commands { if ($command eq "ack_all") { # # Set up the page # &setup_page("Acknowledge All Alarms"); # # Note: you would have to write the "ack all" # command yourself! &moncgi_ack_all; } else { # # We didn't find anything, return # return 0; } return 1; # we did find something, suppress further command processing } The last step is to create the actual subroutines which will do the custom work you want them to do (assuming you weren't just calling existing commands in a different way. In our example, this means we have to write a function that actually goes out and acks all existing failures. We won't do this here, but hopefully this gives you an idea of how to proceed. sub moncgi_ack_all { # # Here is where the actual code to do the "ack all" would go # } When future releases of mon.cgi come out, you can copy and paste your custom subroutines and be up and running with the new version in minimal time. At least, that is what this was designed for. Credits ------- The current maintainer is Andrew Ryan <an...@na...>. Report all bugs to him or the mon users mailing list. + Originally by: Arthur K. Chan <ar...@al...> + Based on the Mon program by Jim Trocki <tr...@tr...>. http://www.kernel.org/software/mon/ + Rewritten to support Mon::Client, mod_perl, taint mode, authentication, the strict pragma, and other visual/functional enhancements by Andrew Ryan <an...@na...>. + Downtime logging contributed by Martha H Greenberg <ma...@mi...> + Site customization extensions by Ed Ravin <er...@pa...> + The contributions of members of the mon-users mailing list have been invaluable in many ways. |
From: David N. <vi...@us...> - 2004-11-15 14:45:30
|
Update of /cvsroot/mon/mon/mon.d In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv9218/mon.d Modified Files: dialin.monitor.wrap.c dns.monitor file_change.monitor fping.monitor phttp.monitor reboot.monitor smtp3.monitor trace.monitor traceroute.monitor up_rtt.monitor Added Files: http_tppnp.monitor radius.monitor snmpvar.monitor Removed Files: http_t.monitor http_tpp.monitor Log Message: Pulling lots of changes from the 1.0.0pre* branch into the HEAD, to prepare to tag mon-1.1pre1 Index: fping.monitor =================================================================== RCS file: /cvsroot/mon/mon/mon.d/fping.monitor,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -C2 -d -r1.1.1.1 -r1.2 *** fping.monitor 9 Jun 2004 05:18:05 -0000 1.1.1.1 --- fping.monitor 15 Nov 2004 14:45:19 -0000 1.2 *************** *** 69,73 **** { chomp; ! if (/^(\S+).*unreachable/) { push (@unreachable, $1); --- 69,73 ---- { chomp; ! if (/^(\S+).*unreachable/i) { push (@unreachable, $1); --- NEW FILE: http_tppnp.monitor --- #!/usr/bin/perl # # Parallel http monitor, with timing, using separate process for each request # results are gathered using a named pipe # an optional "SmartAlarm" capability is provided # to classify alarms and/or limit alarms when there # are sporadic outages # # http_tppnp.monitor : http _ timing - proxy - parallel - named pipe # http _ t p p np # # # Jon Meek # Lawrenceville, NJ # me...@ie... # # $Id: http_tppnp.monitor,v 1.2 2004/11/15 14:45:19 vitroth Exp $ # # Copyright (C) 2002, Jon Meek # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # =head1 NAME B<http_tppnp.monitor> - http/https server parallel monitor for mon =head1 DESCRIPTION http/https server monitor for mon. Logs timing and size results, can use a proxy server. Each measurements is made using a separate measurement process, a central server is used to collect, process, and log the results. An optional "SmartAlarm" capability is provided to classify alarms and/or limit alarms when there are sporadic outages =head1 SYNOPSIS B<http_tppnp.monitor> -l log_file_YYYYMM.log [--servertimeout nn] [--clienttimeout nn] [--responsealarmtime nn] [--randskew nn] [--okcodes nnn,mmm,kkk] [--nocache] [--pipe pipename] [--stripprotocol] [--smartalarm smartalarm.module] [--sacfg smartalarm.cfg] [--smartalarmdir /smartalarm/path] [--forcesmartalarm] [--d --debug] [--v] host [host:/path_to_doc ...] The host list can be in any combination of the following: webmail.mysite.com/index.html http://webmail.mysite.com/ test.mysite.com/~meekj/ca_...@pr... http://webmail.mysite.com:81/ https://webmail.mysite.com/ http is the default if the protocol is not specified =head1 OPTIONS =over 5 =item B<-l log_file_template> or B<--log log_file_template> /path/to/logs/internet_web_YYYYMM.log Current year & month are substituted for YYYYMM, that is the only possible template at this time. The format of the log file is: unix_time proxy protocol://host path bytes response_time response_code If B<--stripprotocol> is specified then protocol:// is not included. The response_time is in seconds. If the response was determined to be a failure the time is reported as a negative number. =item B<-c> or B<--okcodes> Comma seperated list of acceptable http response codes, 200 is the default but must be explicitly included in the list if -c or --okcodes is used. =item B<--nocache> Add 'Pragma: no-cache' header to all requests. Used to bypass caches. =item B<--servertimeout N s> Wait this long before giving up the wait for measurement results. If you change this, be sure that it is at least (clienttimeout + randskew + 5) seconds. Defaults to 45 seconds. =item B<--clienttimeout N s> N s The maximum time each measurement process waits for a response after its request is made (timeout starts after randskew time). Defaults to 30 seconds. =item B<--responsealarmtime N s> or B<-T N s> Trigger an alarm if any response is greater than N seconds. Defaults to a very large number, effectively disabling response time checks beyond the regular timeout. =item B<--randskew N s> Each measurement process will wait a random number of seconds, up to this maximum number before starting. Defaults to 10 seconds. =item B<--stripprotocol> Strip {http, https, ftp}:// from the URL stored in the logfile, for backwards compatibility of log format. =item B<--smartalarm Full/path/or/NameOfSmartAlarm> For selecting the httpSmartAlarm module to filter alarms and trigger an alarm only if certain conditions are met. If the full path is not specified, then the smart alarm is expected to exist in the ./mon.d directory (or more precisely, in the same directory as this monitor). Note that .pm should not be included in the module name, however the monitor will strip it out if it is included. The httpSmartAlarm module has the following structure: package httpSmartAlarm; # # Module to provide "Smart Alarms" for http_tppnp.monitor # use Exporter(); $VERSION = 0.02; @ISA = qw(Exporter); @EXPORT = qw(CheckAlarm); sub CheckAlarm { my ($ConfigFile, %TestResult) = @_; $TotalDownCount = 0; @DownList = (); &ReadParams($ConfigFile); # Read your config file, if you have one foreach $k (sort keys %TestResult) { # Check the results print "TestResult: $k - $TestResult{$k}\n" if $Debug; ($Failed, $tod, $proxy, $protocol, $site, $file, $size, $t, $http_code) = split(' ', $TestResult{$k}); # # Supply some sort of algorithm here # } return ($TotalDownCount, @DownList); } # Supply a ReadParams subroutine, if needed 1; =item B<--smartalarmdir /path/to/SmartAlarm> Alternate method of supplying the path to the filter module. =item B<--forcesmartalarm> Run SmartAlarm even if there are no failures. Useful if your SmartAlarm looks for other problems such as a bad route. =item B<--sacfg> The full path to the SmartAlarm configuration file. =item B<--pipe /path/to/pipe> The full path, including file name, of the named pipe used for inter-process communication. The default is /tmp/http_tppnp, the PID of the server process is added to this name to ensure uniqueness and allow multiple sets of server/clients to run simultaneously. =item B<-d> Debug/Test, for manual testing only. =item B<-v> Verbose, show content of returned data, for manual testing only. =item B<-a> [Not backported from http_tpp yet] list all results if there is a failure, otherwise list only failed tests =item B<-r> [Not backported from http_tpp yet] Follow redirects, can be useful with -d =back =head1 MON CONFIGURATION EXAMPLE Note that a proxy will be used to access ot.myweb.com hostgroup internet_web www.ama-assn.org www.gartner.com test.mysite.com/~meekj/ca_zip.txt ot.myweb.com/ca_...@pr... watch internet_web service internet_web interval 5m monitor http_tpps.monitor -l /usr/local/mon/logs/internet_web_YYYYMM.log -T 10 -t 15 period wd {Sun-Sat} alert mail.alert firewall_admin alertevery 1h summary Command line test examples: http_tpps.monitor -d www.redhat.com bns.pha.com mythey.com/_mem_bin/FormsLogin.asp\?/ nonexist.pha.com www.sun.com/@proxy.labs.theyw.com http_tpps.monitor -d www...@pr... www.sun.com/@proxy.labs.theyw.com www.yahoo.com/@proxy.labs.theyw.com =head1 BUGS Using a proxy for https or ftp has not been tested, and probably does not work at this time because all proxies are invoked as http. The path to mkfifo is hardcoded to /usr/bin/mkfifo, this is good for Linux and Solaris, but should be an option. Earlier versions had occasional problems with zombie/defunct processes under extreme conditions, such as DNS slowness. Additional protections have been added and this does not seem to be a problem. At times, the monitor would do an "exit 1" telling mon that there was a failure even though the failure list is empty. This is probably fixed. It was due the main program exiting before all the child processes. A two second wait before an "exit 0" appears to be sufficient, but the SIGCHLD handler is also disabled. If zombie processes appear, this method should be reviewed. The above problem could be avoided by a mon option to ignore alerts with an empty failure summary. =head1 REQUIRED PERL MODULES LWP::UserAgent HTTP::Request::Common Time::HiRes and, if https/SSL monitoring will be performed Crypt::SSLeay =head1 AUTHOR Jon Meek, me...@ie... =head1 SEE ALSO http_tp.monitor http_tpp.monitor (should not be used, this monitor is a replacement) phttp.monitor by Gilles LAMIRAL lwp-http.mon by Daniel Hagerty (ha...@li...) =cut $RCSid = q{$Id: http_tppnp.monitor,v 1.2 2004/11/15 14:45:19 vitroth Exp $ }; use IO::Socket; use POSIX qw(:signal_h WNOHANG); use Getopt::Long; use Time::HiRes qw( gettimeofday tv_interval ); use LWP::UserAgent; use HTTP::Request::Common; $SmartAlarmConfig = ''; # Initialize, in case none is supplied GetOptions( "servertimeout=i" => \$ServerTimeout, "clienttimeout=i" => \$ClientTimeout, "responsealarmtime=i" => \$ResponseAlarmTime, "T=i" => \$ResponseAlarmTime, "randskew=i" => \$RandSkew, "okcodes=s" => \$opt_c, "pipe=s" => \$NamedPipe, "c=s" => \$opt_c, "l=s" => \$opt_l, "log=s" => \$opt_l, "stripprotocol" => \$StripProtocol, "nocache" => \$NoCache, "smartalarm=s" => \$SmartAlarm, # Name of the SmartAlarm module "sacfg=s" => \$SmartAlarmConfig, # Name of the SmartAlarm config file "smartalarmdir=s" => \$SmartAlarmDir, "forcesmartalarm" => \$ForceSmartAlarm, "d" => \$Debug, "debug" => \$Debug, "debuglog=s" => \$DebugLog, "v", "client", # For use by client only "url=s" => \$URL, "proxy=s" => \$Proxy, ); $ServerTimeout = 45 unless $ServerTimeout; $ClientTimeout = 30 unless $ClientTimeout; $ResponseAlarmTime = 10000 unless $ResponseAlarmTime; $RandSkew = 10 unless defined $RandSkew; # Can be zero $NamedPipe = '/tmp/http_tppnp' unless $NamedPipe; $MKFIFO = '/usr/bin/mkfifo'; # Program to make the named pipe, or FIFO my $ResponseCount = 0; # Count the responses as they are delivered my %httpCode = (); # Where the results are kept my %httpTime = (); # Keys are in URL@proxy form my %httpSize = (); my %s = (); # A temporary hash used to pass data $TimeOfDay = time; if ($DebugLog) { open(DEBUGLOG, ">>$DebugLog") || warn "Can't open debug log: $DebugLog"; $Debug = 1; } ######################################################################################### # # Client code - started by fork-exec in Server code below # if ($opt_client) { sleep 1; # Give the server a second to get setup sub PipeProblem { # For alarm/timeout signal my $signame = shift; print "$ProgName could not write to pipe, received signal $signame\n"; print DEBUGLOG "\n--------- Exiting from PipeProblem with alert ---------\n\n" if $Debug; exit 1; } $SIG{PIPE} = \&PipeProblem; $RandomDelayTime = int(rand($RandSkew)); print DEBUGLOG "Child($$): $Proxy $URL - Delaying $RandomDelayTime s (max $RandSkew)\n" if $Debug; # exit if ($URL =~ /junk/); # For testing what happens if a client never responds (URL contains 'junk') sleep($RandomDelayTime); # Randomly delay ourselves to avoid a rush my $ua = new LWP::UserAgent; $ua->timeout($ClientTimeout); # Set timeout for LWP $TheContent = ''; if ($Proxy ne 'noproxy') { $ua->proxy('http', "http://$Proxy"); # Need to generalize this } $s{measurementtime} = time; # Not currently used, but may become log option $dt = 0; $t0 = [gettimeofday]; # Get start time if ($NoCache) { $response = $ua->get($URL, Pragma => 'no-cache'); # Request fresh content } else { $response = $ua->request(GET $URL); } $t1 = [gettimeofday]; # Get end time $dt = tv_interval($t0, $t1); # Compute elapsed time $ResultCode = $response->code(); $TheContent = $response->content(); $ByteCount = length($TheContent); print DEBUGLOG "URL: $URL $ResultCode $ByteCount $dt\n" if $Debug; print $TheContent if $opt_v; # # Submit the results to the server process over a named pipe # if (-p $NamedPipe) { # Be sure that the pipe is there, otherwise our server may have exited open (PIPE, ">$NamedPipe") || die "Can't open pipe: $NamedPipe\n"; print PIPE "$URL $Proxy $ResultCode $ByteCount $dt\n"; print DEBUGLOG "\nChild($$) --------- Exiting normally ---------\n" if $Debug; exit 0; # The client invocation ends here } else { print DEBUGLOG "Child($$) exiting because pipe $NamedPipe does not exist\n" if $Debug; exit 0; } } ############# End Client Section ################################################### #################################################################################### # ############# Server Section #################################### # # # Determine path to monitor, for starting children # $ProgName = $0; # Will need full path print DEBUGLOG "\n\nStarting at $TimeOfDay Name: $ENV{PWD} / $ProgName\n" if $Debug; if (!(-x $ProgName)) { # We can't find ourself, won't be able to exec! print DEBUGLOG @ARGV if $Debug; print DEBUGLOG "\n" if $Debug; print "$ProgName cannot be found, or is not executable by mon\n"; exit 1; # Indicate failure to mon } if ($SmartAlarm) { # Use Smart Alarm module use File::Basename; $basename = basename($SmartAlarm); # Get the path to the module $dirname = dirname($SmartAlarm); if ((length($dirname) == 0) || ($dirname eq '.')) { $SmartAlarmDir = dirname($ProgName) unless $SmartAlarmDir; } else { $SmartAlarmDir = $dirname; } $basename =~ s/\.pm$//; print DEBUGLOG "SmartAlarmDir: $SmartAlarmDir Module: $basename\n" if $Debug; # use lib "/usr/local/mon/mon.d"; # Use ENV variable or option later push (@INC, $SmartAlarmDir); eval "use $basename"; do { print "Couldn't load $SmartAlarmDir/$basename.pm: $@\n"; exit 1; } unless ($@ eq ''); httpSmartAlarm->import(); } # # Reap children to avoid defunct processes / zombies # See "Network Programming with Perl" by Lincoln Stein # sub Reaper { my $signame = shift; my $timenow = time; while ((my $child_pid = waitpid(-1, WNOHANG)) > 0) { print DEBUGLOG "Parent $$ Reaped child: $child_pid after $signame at $timenow\n" if $Debug; } } $SIG{CHLD} = \&Reaper; # Handle interrupt key and termination signals sub OtherSIGs { my $signame = shift; unlink $NamedPipe; print "$ProgName Terminated on Signal: $signame\n"; print DEBUGLOG "\n--------- Exiting OtherSIGs with alert following $signame ---------\n\n" if $Debug; exit 1; } $SIG{HUP} = $SIG{INT} = $SIG{QUIT} = $SIG{TERM} = \&OtherSIGs; # # Make the named pipe for children to report results # $NamedPipe .= ".$$"; # Tack on the PID for uniqueness print DEBUGLOG "Making $NamedPipe\n" if $Debug; $cmd = qq{$MKFIFO $NamedPipe}; $ret_val = system($cmd); #$SIG{CHLD} = $SIG{PIPE} = $SIG{INT} = 'IGNORE'; # don't want to die on 'Broken pipe' or Ctrl-C if ($opt_c) { # Parse list of acceptable http response codes (@t) = split(/,/, $opt_c); foreach $code (@t) { $AcceptableResponseCode{$code}++; } } else { $AcceptableResponseCode{200}++; # Default is 200 } foreach $target (@ARGV) { # Build host and path lists print DEBUGLOG "\nTarget: $target\n" if $Debug; # # Normalize the request # we may want to have more restrictive URL formats in the future # and eliminate this # $protocol = 'http'; # Default protocol $host_path = ''; if ($target =~ /^(\w+):\/\/(.*)/) { $protocol = $1; $host_path = $2; } else { $host_path = $target; } print DEBUGLOG "Protocol: $protocol host/path: $host_path\n" if $Debug; undef $proxy_server; if ($host_path =~ /@/) { ($host_path, $proxy_server) = split(/@/, $host_path, 2); } ($host, $Path) = split(/\//, $host_path, 2); if (defined $proxy_server) { $ProxyServer = $proxy_server; } else { $ProxyServer = 'noproxy'; } print DEBUGLOG "$host - $ProxyServer - $Path\n" if $Debug; $URL = "$protocol://$host/$Path"; push(@URLs, $URL); push(@Proxies, $ProxyServer); } $RandSkew = 0 if (@URLs <= 1); # No need to delay if there is a single URL # # Open the named pipe, must be in read/write mode, otherwise open will block # open (PIPE, "+< $NamedPipe") || die "Server Process: Can't open pipe: $NamedPipe\n"; # # Use evals for time-out capability # eval { $SIG{ALRM} = sub {die "Server alarm timeout"}; alarm($ServerTimeout); eval { # # Check each target URL by firing off a measurement child process # for ($i = 0; $i <= $#URLs; $i++) { $URL = $URLs[$i]; $Proxy = $Proxies[$i]; $URL_Proxy = $URL . '@' . $Proxy; # Unique test key $URL_Proxies{$URL_Proxy}++; # Checklist, used to track replies &ForkClient($URL, $Proxy); # Fire off a client to run the test } while (1) { $in = <PIPE>; print DEBUGLOG "Data from pipe: $in" if $Debug; ($s{url}, $s{proxy}, $s{result_code}, $s{byte_count}, $s{dt}) = split(' ', $in); $url = $s{url}; $proxy = $s{proxy}; $URL_Proxy = $url . '@' . $proxy; delete $URL_Proxies{$URL_Proxy}; # Saw this combination, check it off the list $NumTestsLeft = scalar keys(%URL_Proxies); print DEBUGLOG " $NumTestsLeft tests to go\n" if $Debug; # # Save measurement results in hashes # $httpCode{$URL_Proxy} = $s{result_code}; $httpTime{$URL_Proxy} = $s{dt}; $httpSize{$URL_Proxy} = $s{byte_count}; last if ($NumTestsLeft == 0); # Bail out and process if we got all the replies } close PIPE; alarm(0); }; alarm(0); # Race condition prevention }; unlink $NamedPipe; # For housekeeping, and to let any straggling clients know # that the server process has exited # # Process the results, exit occurs from ProcessResults # &ProcessResults(\%httpCode, \%httpTime, \%httpSize); ############# End of Server Code ############################################ # # Subroutines below # sub ForkClient { my ($url, $proxy) = @_; FORK: if ($pid = fork) { # parent here # child process pid is available in $pid # waitpid($pid,0); # Can't do this and retain parallelism # $returnstatus = ($? >> 8); } elsif (defined $pid) { #pid is zero here if defined # child here # Form our exec() string $execstring = "$ProgName --client --url $url --proxy $proxy --pipe $NamedPipe --randskew $RandSkew"; $execstring .= ' --nocache' if $NoCache; # Add additional flags $execstring .= ' -d' if $Debug; $execstring .= " --debuglog $DebugLog" if $DebugLog; $execstring .= ' -v' if $opt_v; print DEBUGLOG "execstring: $execstring\n" if $Debug; exec($execstring); # parent process pid is available with getppid } elsif ($! =~ /No more process/) { # EAGAIN, supposedly recoverable fork error sleep 2; redo FORK; } else { # weirdo fork error # return 1; } } # # Check for alarm conditions, etc. # sub ProcessResults { my ($Codes, $Times, $Sizes) = @_; my @Failures = (); my %FailureDetail = (); my %ResultString = (); # # Check for non-responders, LWP will usually give an error # so we may not exercise this often # foreach $r (keys %URL_Proxies) { # Unfullfilled test results print DEBUGLOG "$r $URL_Proxies{$r}\n" if $Debug; push(@Failures, $r); $ThisOneFailed = 1; $FailureDetail{$r} = 'No response'; ($protocol, $host, $path, $proxy) = &split_url($r); $Times->{$r} = -1.0; $Sizes->{$r} = 0; $Codes->{$r} = 0; if ($StripProtocol) { # Don't include http:// etc in log file for backwards compatibility $ResultString{$r} = sprintf("%d %s %s %s %d %0.4f %d", $TimeOfDay, $proxy, $host, $path, $Sizes->{$r}, $Times->{$r}, $Codes->{$r}); } else { $ResultString{$r} = sprintf("%d %s %s://%s %s %d %0.4f %d", $TimeOfDay, $proxy, $protocol, $host, $path, $Sizes->{$r}, $Times->{$r}, $Codes->{$r}); } $SmartAlarmString{$r} = sprintf("%d %d %s %s %s %s %d %d %0.3f %s", $ThisOneFailed, $TimeOfDay, $proxy, $protocol, $host, $path, $Sizes->{$r}, $Times->{$r}, $Codes->{$r}); } # # Check response codes, times, etc # print DEBUGLOG "\nProcessResults\n" if $Debug; foreach $r (keys %$Codes) { next if (exists $URL_Proxies{$r}); # We already got it above $ThisOneFailed = 0; printf DEBUGLOG ("%8.3f %5d %6d %s\n", $Times->{$r}, $Codes->{$r}, $Sizes->{$r}, $r) if $Debug; # # Check http response code against list # if (!exists $AcceptableResponseCode{$Codes->{$r}}) { $ThisOneFailed++; $Times->{$r} = -1.0 * $Times->{$r}; # Log uses negative time as failure indicator $FailureDetail{$r} = "Bad response code ($Codes->{$r}) "; } # # Check response time against limit, if set, but don't negate response time # if ($ResponseAlarmTime) { if ($Times->{$r} > $ResponseAlarmTime) { $ThisOneFailed++; $FailureDetail{$r} .= 'Long response time'; } } if ($ThisOneFailed) { push(@Failures, $r); } # Pick apart the URL so that we can generate a log entry # compatible with previous versions # ($protocol, $host, $path, $proxy) = &split_url($r); if ($StripProtocol) { # Don't include http:// etc in log file for backwards compatibility $ResultString{$r} = sprintf("%d %s %s %s %d %0.4f %d", $TimeOfDay, $proxy, $host, $path, $Sizes->{$r}, $Times->{$r}, $Codes->{$r}); } else { $ResultString{$r} = sprintf("%d %s %s://%s %s %d %0.4f %d", $TimeOfDay, $proxy, $protocol, $host, $path, $Sizes->{$r}, $Times->{$r}, $Codes->{$r}); } $SmartAlarmString{$r} = sprintf("%d %d %s %s %s %s %d %d %0.3f %s", $ThisOneFailed, $TimeOfDay, $proxy, $protocol, $host, $path, $Sizes->{$r}, $Times->{$r}, $Codes->{$r}); } if ($Debug) { foreach $r (sort keys %ResultString) { print DEBUGLOG "ResultString: $ResultString{$r}\n"; } } # # Write results to logfile, if -l # if ($opt_l) { $LogFile = $opt_l; ($sec, $min, $hour, $mday, $Month, $Year, $wday, $yday, $isdst) = localtime($TimeOfDay); $Month++; $Year += 1900; $YYYYMM = sprintf('%04d%02d', $Year, $Month); $LogFile =~ s/YYYYMM/$YYYYMM/; # Fill in current year and month if (-e $LogFile) { # Check for existing log file $NewLogFile = 0; } else { $NewLogFile = 1; } open(LOG, ">>$LogFile") || warn "$0 Can't open logfile: $LogFile\n"; foreach $r (sort keys %ResultString) { print LOG "$ResultString{$r}\n"; } close LOG; } if ((@Failures == 0) && $ForceSmartAlarm) { # Run SmartAlarm to look for other problems, i.e. bad route ($count, @Failures) = &CheckAlarm($SmartAlarmConfig, %SmartAlarmString); if (@Failures == 0) { sleep 2; # Allow SIGCHLDs to arrive $SIG{CHLD} = 'IGNORE'; # We are finished, don't wait for straggling SIGCHLDs (hopefully will not leave zombies) exit 0; } $SummaryString = join ' ', @Failures; # Double check failure list $SummaryString =~ s/^\s+//; # Trim whitespace $SummaryString =~ s/\s+$//; # exit 0 if (length($SummaryString) <= 0); # Require data in failure list print "$SummaryString\n"; # Note that we are not supplying any detail data from SmartAlarm print DEBUGLOG "\n--------- Exiting ForceSmartAlarm alarm mode with alert ---------\n\n" if $Debug; exit 1; # Indicate failure to mon } if (@Failures == 0) { # No failures, exit with status 0 print DEBUGLOG "\n--------- No Failures ---------\n" if $Debug; print DEBUGLOG "\n--------- Exiting normally ---------\n\n" if $Debug; sleep 2; # Allow SIGCHLDs to arrive $SIG{CHLD} = 'IGNORE'; # We are finished, don't wait for straggling SIGCHLDs (hopefully will not leave zombies) exit 0; } if ($SmartAlarm) { # Smart alarm enabled, check the down list to see if we really # want to trigger an alarm ($SmartAlarmDownCount, @SmartAlarmFailures) = &CheckAlarm($SmartAlarmConfig, %SmartAlarmString); print DEBUGLOG "*** SmartAlarm Result: $SmartAlarmDownCount\n" if $Debug; if ($SmartAlarmDownCount) { # Have alarm, exit with status 1 print DEBUGLOG "\n--------- Have Smart Alarm Failures - mon Data Below ---------\n" if $Debug; @SortedFailures = sort @SmartAlarmFailures; # Sort to help mon in summary mode $SummaryString = join ' ', @SortedFailures; # Double check failure list $SummaryString =~ s/^\s+//; # Trim whitespace $SummaryString =~ s/\s+$//; # exit 0 if (length($SummaryString) <= 0); # Require data in failure list print "$SummaryString\n"; # There were failures, list them foreach $r (sort @Failures) { # Then provide details print "$r $Sizes->{$r} bytes $Times->{$r} s $FailureDetail{$r}\n"; } print DEBUGLOG "\n--------- Exiting SmartAlarm mode with alert ---------\n\n" if $Debug; exit 1; # Indicate failure to mon } print DEBUGLOG "\n--------- No Failures Classified by SmartAlarm ---------\n" if $Debug; print DEBUGLOG "\n--------- Exiting SmartAlarm mode ---------\n\n" if $Debug; sleep 2; # Allow SIGCHLDs to arrive $SIG{CHLD} = 'IGNORE'; # We are finished, don't wait for straggling SIGCHLDs (hopefully will not leave zombies) exit 0; } # Regular alarm mode print DEBUGLOG "\n--------- Have Failures - mon Data Below ---------\n" if $Debug; @SortedFailures = sort @Failures; # Sort to help mon in summary mode $SummaryString = join ' ', @SortedFailures; # Double check failure list $SummaryString =~ s/^\s+//; # Trim whitespace $SummaryString =~ s/\s+$//; # exit 0 if (length($SummaryString) <= 0); # Require data in failure list print "$SummaryString\n"; # There were failures, list them foreach $r (@SortedFailures) { # Then provide details print "$r $Sizes->{$r} bytes $Times->{$r} s $FailureDetail{$r}\n"; } print DEBUGLOG "\n--------- Exiting regular alarm mode with alert ---------\n\n" if $Debug; exit 1; # Indicate failure to mon } # # Pick apart the URL so that we can generate a log entry # compatible with previous versions # sub split_url { my $r = shift; my ($protocol, $host, $path, $proxy); $r =~ /^(\w+):\/\/([^\/]+)\/?(.*?)@(.*)/; $protocol = $1; $host = $2; # Ends when '/' seen $path = $3; $proxy = $4; if (length($path) < 1) { # Set the path for logging purposes $path = '/'; # we don't want an empty, space separated, field } return $protocol, $host, $path, $proxy; } Index: smtp3.monitor =================================================================== RCS file: /cvsroot/mon/mon/mon.d/smtp3.monitor,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -C2 -d -r1.1.1.1 -r1.2 *** smtp3.monitor 9 Jun 2004 05:18:05 -0000 1.1.1.1 --- smtp3.monitor 15 Nov 2004 14:45:19 -0000 1.2 *************** *** 6,10 **** # $Id$ # ! # Copyright (C) 2001, Jon Meek, me...@ie... # # This program is free software; you can redistribute it and/or modify --- 6,10 ---- # $Id$ # ! # Copyright (C) 2001-2003, Jon Meek, me...@ie... # # This program is free software; you can redistribute it and/or modify *************** *** 25,40 **** =head1 NAME ! B<smtp3.monitor> - smtp monitor for mon with timing and logging =head1 DESCRIPTION A SMTP monitor using IO::Socket with connection response timing and ! optional logging. This test is simple, as soon as the greeting banner ! is received from the SMTP server the monitor client closes the session ! with a QUIT command. =head1 SYNOPSIS ! B<smtp3.monitor> -l log_file_YYYYMM.log -t timeout_seconds -T alarm_time host host1 host2 ... =head1 OPTIONS --- 25,46 ---- =head1 NAME ! B<smtp3.monitor> - smtp monitor for mon with timing, logging, optional MX lookup, and diagnostic capability. =head1 DESCRIPTION A SMTP monitor using IO::Socket with connection response timing and ! optional logging. This test is reasonably complete. Following the ! greeting banner from the SMTP server the monitor client issues the ! HELO and MAIL commands then closes the session with a QUIT ! command. Early versions of this monitor simply looked at the initial ! greeting banner, but that did not detect certain temporary failure ! conditions. ! ! While configuring mon for this monitor keep in mind that a busy mail ! server may reject new connections. =head1 SYNOPSIS ! B<smtp3.monitor> [-d] [-l log_file_YYYYMM.log] [--timeout timeout_seconds] [--alarmtime alarm_time] [--mx] [--esmtp] [--requiretls] [--nofail] [--from us...@do...] [--to r1...@d1...,r2...@d2...] [--size nnnnn] [--port nn] host host1 host2 ... =head1 OPTIONS *************** *** 42,51 **** =over 5 ! =item B<-d> Debug/Test ! =item B<-t timeout> Connect timeout in seconds ! =item B<-T alarm_timeout> Alarm if connect is successful but took ! longer than alarm_timeout seconds =item B<-l log_file_template> /path/to/logs/smtp_YYYYMM.log --- 48,59 ---- =over 5 ! =item B<-d> Debug/Diagnostic mode. Useful for manual command line use ! for diagnosing mail delivery problems. To determine if a mail destination ! will accept mail the --mx flag will useful. ! =item B<--timeout timeout> Connect timeout in seconds. ! =item B<--alarmtime alarm_timeout> Alarm if connect is successful but took ! longer than alarm_timeout seconds. =item B<-l log_file_template> /path/to/logs/smtp_YYYYMM.log *************** *** 53,56 **** --- 61,90 ---- possible template at this time. + =item B<--mx> Lookup the MX records for the domains/hosts and test + them in preference order. The first successful test will be + considered a success for that domain. This was originally devised for + manual command line use as a tool to verify that mail stuck in + outbound queues really can not be delivered. It could be used with mon + as well, however you are usually going to want to test ALL of your + smtp servers, not just be sure that one of them is OK. --mx applies to + all of the domains/hosts listed on the command line. + + =item B<--esmtp> + + Try ESMTP before SMTP. + + =item B<--requiretls> + + Check that STARTTLS is offered, fail if it is not. This option forces B<--esmtp>. + + =item B<--nofail> + + Never provide a failure return to mon. Useful in certain testing envrionments + when logging. + + =item B<--port nnn> + + Specify a port to use. Defaults to 25. + =back *************** *** 63,67 **** service smtp_check interval 5m ! monitor smtp3.monitor -t 70 -T 30 -l /n/na1/logs/wan/smtp_YYYYMM.log period wd {Sun-Sat} alert mail.alert me...@my... --- 97,101 ---- service smtp_check interval 5m ! monitor smtp3.monitor --timeout 70 --alarmtime 30 -l /n/na1/logs/wan/smtp_YYYYMM.log period wd {Sun-Sat} alert mail.alert me...@my... *************** *** 83,93 **** F<measurement_time> - Is the time of the connection attempt in seconds since 1970 ! F<smtp_host_name> - Is the name of the smtp server that was tested F<connect_time> - Is the time from the connect request until the SMTP ! greeting appeared in seconds with 100 microsecond resolution F<smtp_code_and_banner> - Should have the SMTP response code integer ! followed by the greeting banner F<connect_error> - If present may indicate "Connect failed" meaning --- 117,131 ---- F<measurement_time> - Is the time of the connection attempt in seconds since 1970 ! F<smtp_host_name> - Is the name of the smtp server that was tested. If ! --mx was selected then this field is servername=MX_record where ! MX_record is the mail domain (host) from the command line. F<connect_time> - Is the time from the connect request until the SMTP ! greeting appeared in seconds with 100 microsecond resolution. If the ! connection failed the time spent waiting for the connection will be a ! negative number. F<smtp_code_and_banner> - Should have the SMTP response code integer ! followed by the greeting banner if there was a problem. F<connect_error> - If present may indicate "Connect failed" meaning *************** *** 99,110 **** =head1 BUGS ! A SMTP temporary failure code should cause the monitor to retry the connection. ! This initial release has seen less than one day of testing. ! =head1 REQUIRED PERL MODULES IO::Socket Time::HiRes If you do not have Time::HiRes you can choose to comment out the lines --- 137,153 ---- =head1 BUGS ! It should be possible to specify --esmtp and --requiretls on a per-host basis. ! A SMTP temporary failure code could cause the monitor to retry the connection ! a certain number of times. ! It is not yet possible to specify the username / domain for the HELO and ! MAIL commands, but it would be very simple to add. ! ! =head1 REQUIRED NON-STANDARD PERL MODULES IO::Socket Time::HiRes + Net::DNS (only if --mx option will be used) If you do not have Time::HiRes you can choose to comment out the lines *************** *** 116,196 **** =cut ! ! use Getopt::Std; use IO::Socket; use Time::HiRes qw( gettimeofday tv_interval ); ! $RCSid = q{$Id$}; ! getopts ("ds:t:T:l:"); # s not used yet, may be optional smtp command ! $TimeOut = $opt_t || 30; # Default timeout in seconds ! $dt = 0; # Initialize connect time variable ! @Failures = (); ! $TimeOfDay = time; print "TimeOfDay: $TimeOfDay\n" if $opt_d; - foreach $host (@ARGV) { # Check each host - - print "Check: $host\n" if $opt_d; - push(@HostNames, $host); - $TestTime{$host} = time; # ! # Use eval/alarm to handle timeout # ! eval { ! local $SIG{ALRM} = sub { die "timeout\n" }; # Alarm handler ! ! alarm($TimeOut); # Do a SIG_ALRM in $TimeOut seconds ! $t1 = [gettimeofday]; # Start connection timer, then connect ! $sock = IO::Socket::INET->new(PeerAddr => $host, ! PeerPort => 'smtp(25)', ! Proto => 'tcp'); ! if (defined $sock) { # Connection succeded ! $in = <$sock>; # Get banner ! $t2 = [gettimeofday]; # Stop clock ! chomp $in; # Clean up banner EOL ! $ResponseBanner{$host} = $in; ! print "banner: $in\n" if $opt_d; ! # print $sock "NOOP\r\n"; # may want to add optional command later ! print $sock "QUIT\r\n"; # Shutdown connection ! close $sock; ! $dt = tv_interval ($t1, $t2); # Compute connection time ! if ($in !~ /^220\s+/) { # Consider "220 Service ready" to be only valid ! push(@Failures, $host); # Note failure ! $FailureDetail{$host} = $in; # Save failure banner } ! $ConnectTime{$host} = sprintf("%0.4f", $dt); # Format to 100us resolution ! if ($opt_T) { # Check for slow response ! if ($dt > $opt_T) { ! push(@Failures, $host); # Call it a failure ! $FailureDetail{$host} = "Slow Connect"; } } ! ! } else { # Connection failed ! ! print "Connect to $host failed\n" if $opt_d; ! push(@Failures, $host); # Save failed host ! $FailureDetail{$host} = "Connect failed"; ! $ConnectTime{$host} = -1; } - }; - alarm(0); # Stop alarm countdown - if ($@ =~ /timeout/) { # Detect timeout failures - push(@Failures, $host); - $FailureDetail{$host} = "Connect timeout"; - $ConnectTime{$host} = -1; } } if ($opt_d) { foreach $host (sort @HostNames) { ! print "$TestTime{$host} $host $ConnectTime{$host} $ResponseBanner{$host}\n"; } } --- 159,297 ---- =cut ! use English; ! use Sys::Hostname; ! use Getopt::Long; use IO::Socket; use Time::HiRes qw( gettimeofday tv_interval ); ! $RCSid = q{$Id$ }; ! $ESMTP = 0; ! $RequireTLS = 0; ! GetOptions ('mx' => \$UseMX, ! 'd' => \$opt_d, ! 'esmtp' => \$ESMTP, ! 'requiretls' => \$RequireTLS, ! 'timeout=i' => \$TimeOut, ! 't=i' => \$TimeOut, ! 'alarmtime=i' => \$opt_T, ! 'T=i' => \$opt_T, ! 'logfile=s' => \$opt_l, ! 'l=s' => \$opt_l, ! 'nofail' => \$NoFail, ! 'size=i' => \$MessageSize, ! 'port=i' => \$Port, ! 'from=s' => \$FromAddress, ! 'to=s' => \$ToAddresses, ! ); ! $ESMTP = 1 if $RequireTLS; ! if ($UseMX) { # Will need Net::DNS Module, but don't require the module if it won't be used ! eval "use Net::DNS"; ! do { ! warn "Couldn't load Net::DNS: $@"; ! undef $UseMX; ! } unless ($@ eq ''); ! $Resolver = new Net::DNS::Resolver; ! } ! ! $Port = 'smtp(25)' unless $Port; ! $TimeOut = 30 unless $TimeOut; # Default timeout in seconds ! $dt = 0; # Initialize connect time variable ! ! @Failures = (); # Initialize failure list ! ! $TimeOfDay = time; # Current time print "TimeOfDay: $TimeOfDay\n" if $opt_d; # ! # Get the process username and the hostname of the monitor machine # ! $MonitorUsername = getpwuid($UID); ! $MonitorHostname = hostname; ! $host_address = gethostbyname($MonitorHostname); ! $MonitorHostname = gethostbyaddr($host_address, AF_INET); ! $FromAddress = qq{$MonitorUsername\@$MonitorHostname} unless $FromAddress; ! print " From: $FromAddress\n" if $opt_d; ! print " TimeOut: $TimeOut\n" if $opt_d; ! ! # ! # Check each host, or MX record ! # ! foreach $host (@ARGV) { ! print "Check: $host\n" if $opt_d; ! # ! # Get the MX records, if we need them ! # ! if ($UseMX) { ! undef %MXval; ! undef @MXorder; ! @mx = mx($Resolver, $host); ! if (@mx) { ! foreach $rr (@mx) { ! $preference = $rr->preference; ! $mxrecord = $rr->exchange; ! $MXval{$mxrecord} = $preference; } ! } else { ! print "can't find MX records for $host: ", $Resolver->errorstring, "\n" if $opt_d; ! push(@Failures, $host); # Call it a failure ! $FailureDetail{$host} = "Can't find MX records"; ! next; ! } ! # ! # Sort the MX records into preference order ! # ! print "MX records for $host:\n" if $opt_d; ! foreach $k (sort {$MXval{$a} <=> $MXval{$b}} keys %MXval) { ! $Arecord = ''; # Clear for this MX ! push(@MXorder, $k); ! if ($opt_d) { # If in debug/verbose mode lookup A record ! $name = $k . '.'; # Append dot for absolute lookup ! if ($packet = $Resolver->search($name)) { ! @answer = $packet->answer; ! foreach $rr (@answer) { ! $address = ''; ! $name = $rr->name; ! $type = $rr->type; ! $address = $rr->address if ($type eq 'A'); ! $Arecord .= "$type: $address "; # Append, in case some other records are found ! } ! } else { ! $arecord = "Could not find A record for $name"; } } ! printf " %3d - %s %s\n", $MXval{$k}, $k, $Arecord if $opt_d; } } + # + # Now actually do the smtp check + # + if ($UseMX && @mx) { # Check MX records, stop after first success + foreach $mx (@MXorder) { + $HostPlusMX = "$host=$mx"; + push(@HostNames, $HostPlusMX); + $TestTime{$HostPlusMX} = time; + print "Checking $HostPlusMX\n" if $opt_d; + $result = &CheckSMTP($HostPlusMX); + last if ($result); + } + } else { # Regular host check + push(@HostNames, $host); + $TestTime{$host} = time; + $result = &CheckSMTP($host); + } } if ($opt_d) { foreach $host (sort @HostNames) { ! print "$TestTime{$host} $host $ConnectTime{$host} $InitialBanner{$host}\n"; ! # ($shortfail, $rest) = split(/\n/, $InitialBanner{$host}, 2); ! # print "$TestTime{$host} $host $ConnectTime{$host} $shortfail\n"; } } *************** *** 207,213 **** $YYYYMM = sprintf('%04d%02d', $Year, $Month); $LogFile =~ s/YYYYMM/$YYYYMM/; # Fill in current year and month ! open(LOG, ">>$LogFile") || warn "$0 Can't open logfile: $LogFile\n"; foreach $host (sort @HostNames) { print LOG "$TestTime{$host} $host $ConnectTime{$host} $FailureDetail{$host}\n"; } --- 308,318 ---- $YYYYMM = sprintf('%04d%02d', $Year, $Month); $LogFile =~ s/YYYYMM/$YYYYMM/; # Fill in current year and month ! open(LOG, ">>$LogFile") || warn "$0 Can't open logfile: $LogFile\n"; foreach $host (sort @HostNames) { + $FailureDetail{$host} =~ s/\n/ /g; # Put it on one line, but result may be too long + $FailureDetail{$host} =~ s/ $//; # Trim final space + # ($shortfail, $rest) = split(/\n/, $FailureDetail{$host}, 2); + # print LOG "$TestTime{$host} $host $ConnectTime{$host} $shortfail\n"; print LOG "$TestTime{$host} $host $ConnectTime{$host} $FailureDetail{$host}\n"; } *************** *** 231,239 **** print "\n"; ! exit 1; # Indicate failure to mon __END__ ! SMTP Reply Codes From RFC-821 - may use in the future --- 336,525 ---- print "\n"; ! exit 0 if $NoFail; # Never indicate failure if $NoFail is set ! exit 1; # Indicate failure to mon ! ! sub CheckSMTP { ! my $host = shift; ! my $t1, $t2, $dt, $mx_name, $stripped_host; ! my $Failure = 0; # Flag to indicate failure for return code ! # return 0 may not be working inside eval ! ! my $buflength = 1024; ! ! if ($host =~ /=/) { # Have MX data ! ($mx_name, $stripped_host) = split(/=/, $host); ! } else { ! $stripped_host = $host; ! } ! ! # ! # Use eval/alarm to handle timeout ! # ! eval { ! local $SIG{ALRM} = sub { die "timeout\n" }; # Alarm handler ! ! alarm($TimeOut); # Do a SIG_ALRM in $TimeOut seconds ! $t1 = [gettimeofday]; # Start connection timer, then connect ! my $sock = IO::Socket::INET->new(PeerAddr => $stripped_host, ! PeerPort => $Port, ! Proto => 'tcp'); ! ! if (defined $sock) { # Connection succeded ! ! $in = ''; ! $bytes = sysread($sock, $in, $buflength); # Handle multi-line banners ! $InitialBanner{$host} = $in; ! ! $t2 = [gettimeofday]; # Stop clock ! print " Banner: $InitialBanner{$host}\n" if $opt_d; ! ! if ($InitialBanner{$host} !~ /^220/) { # Consider "220 Service ready" to be only valid ! push(@Failures, $host); # Note failure ! if (length($InitialBanner{$host}) == 0) { # Note empty banner ! $InitialBanner{$host} = 'null'; ! } ! $FailureDetail{$host} = "BANNER: " . $InitialBanner{$host}; # Save failure banner ! $ConnectTime{$host} = -1; ! # last; ! $Failure = 1; ! print "QUIT\r\n" if $opt_d; ! print $sock "QUIT\r\n"; # Shutdown connection ! close $sock; ! return 0; ! } ! ! if ($ESMTP) { # Try EHLO first ! print "EHLO $MonitorHostname\r\n" if $opt_d; ! print $sock "EHLO $MonitorHostname\r\n"; ! ! $in = ''; ! $bytes = sysread($sock, $in, $buflength); # Handle multi-line banners ! $EhloResponse{$host} = $in; ! ! print " EHLO resp: $EhloResponse{$host}\n" if $opt_d; ! if ($EhloResponse{$host} !~ /^250/) { # Consider "250 Requested mail action okay, completed" to be only valid ! push(@Failures, $host); # Note failure ! print "EHLO Failure!\n" if $opt_d; ! $FailureDetail{$host} = "EHLO: " . $EhloResponse{$host}; # Save failure banner ! #last; ! $Failure = 1; ! print "QUIT\r\n" if $opt_d; ! print $sock "QUIT\r\n"; # Shutdown connection ! close $sock; ! return 0 if $RequireESMTP; ! } ! ! if ($RequireTLS && ($EhloResponse{$host} !~ /STARTTLS/)){ # Check TLS advertisement ! push(@Failures, $host); # Note failure ! $FailureDetail{$host} = "STARTTLS Not Offered "; ! print "STARTTLS Not Offered!\n" if $opt_d; ! print $sock "QUIT\r\n"; # Shutdown connection ! close $sock; ! return 0; ! } ! ! } ! ! if (!$ESMTP or ($ESMTP && $Failure)) { ! print $sock "HELO $MonitorHostname\r\n"; ! ! $in = ''; ! $bytes = sysread($sock, $in, $buflength); # Handle multi-line banners ! $HeloResponse{$host} = $in; ! ! print " HELO resp: $HeloResponse{$host}\n" if $opt_d; ! if ($HeloResponse{$host} !~ /^250/) { # Consider "250 Requested mail action okay, completed" to be only valid ! push(@Failures, $host); # Note failure ! print "HELO Failure!\n" if $opt_d; ! $FailureDetail{$host} = "HELO: " . $HeloResponse{$host}; # Save failure banner ! #last; ! $Failure = 1; ! print "QUIT\r\n" if $opt_d; ! print $sock "QUIT\r\n"; # Shutdown connection ! close $sock; ! return 0; ! } ! } ! ! $FromLine = qq{MAIL From:<$FromAddress>}; ! if ($MessageSize) { ! $FromLine .= qq{ SIZE=$MessageSize}; ! } ! $FromLine .= qq{\r\n}; ! print $FromLine if $opt_d; ! print $sock $FromLine; ! ! chomp($MailResponse{$host} = <$sock>); ! print " MAIL resp: $MailResponse{$host}\n" if $opt_d; ! if ($MailResponse{$host} !~ /^250\s+/) { # Consider "250 Requested mail action okay, completed" to be only valid ! push(@Failures, $host); # Note failure ! $FailureDetail{$host} = "MAIL: " . $MailResponse{$host}; # Save failure banner ! #last; ! $Failure = 1; ! print "QUIT\r\n" if $opt_d; ! print $sock "QUIT\r\n"; # Shutdown connection ! close $sock; ! return 0; ! } ! ! if ($ToAddresses) { # Addresses given on command line ! (@to_addrs) = split(/,/, $ToAddresses); ! foreach $to (@to_addrs) { ! $RcptCommand = qq{RCPT TO:<$to>}; ! print "$RcptCommand\r\n" if $opt_d; ! print $sock "$RcptCommand\r\n"; ! chomp($RcptResponse = <$sock>); ! print " RCPT resp: $RcptResponse\n" if $opt_d; ! } ! } ! ! print "QUIT\r\n" if $opt_d; ! print $sock "QUIT\r\n"; # Shutdown connection ! close $sock; ! ! $dt = tv_interval ($t1, $t2); # Compute connection time ! $ConnectTime{$host} = sprintf("%0.4f", $dt); # Format to 100us resolution + if ($opt_T) { # Check for slow response + if ($dt > $opt_T) { + push(@Failures, $host); # Call it a failure + $FailureDetail{$host} = "Slow Connect"; + $Failure = 1; + return 0; + } + } + + } else { # Connection failed + $t2 = [gettimeofday]; # Stop clock + $dt = tv_interval ($t1, $t2); # Compute connection time + $ConnectTime{$host} = sprintf("-%0.4f", $dt); # Format to 100us resolution, -val if failure + print " Connect to $host failed\n" if $opt_d; + push(@Failures, $host); # Save failed host + $FailureDetail{$host} = "Connect failed"; + $Failure = 1; + return 0; + } + }; + alarm(0); # Stop alarm countdown + if ($@ =~ /timeout/) { # Detect timeout failures + $t2 = [gettimeofday]; # Stop clock + $dt = tv_interval ($t1, $t2); # Compute connection time + $ConnectTime{$host} = sprintf("-%0.4f", $dt); # Format to 100us resolution, -val if timeout + push(@Failures, $host); + print " Connect to $host timed-out\n" if $opt_d; + $FailureDetail{$host} = "Connect timeout"; + $Failure = 1; + return 0; + } + + if ($Failure) { # Important when an MX record list is being checked + return 0; + } else { + return 1; + } + } __END__ ! SMTP Reply Codes From RFC-821 - may use in the future *************** *** 247,253 **** 250 Requested mail action okay, completed 251 User not local; will forward to <forward-path> ! 354 Start mail input; end with <CRLF>.<CRLF> ! 421 <domain> Service not available, closing transmission channel --- 533,539 ---- 250 Requested mail action okay, completed 251 User not local; will forward to <forward-path> ! 354 Start mail input; end with <CRLF>.<CRLF> ! 421 <domain> Service not available, closing transmission channel *************** *** 258,262 **** 451 Requested action aborted: local error in processing 452 Requested action not taken: insufficient system storage ! 500 Syntax error, command unrecognized [This may include errors such as command line too long] --- 544,548 ---- 451 Requested action aborted: local error in processing 452 Requested action not taken: insufficient system storage ! 500 Syntax error, command unrecognized [This may include errors such as command line too long] Index: phttp.monitor =================================================================== RCS file: /cvsroot/mon/mon/mon.d/phttp.monitor,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -C2 -d -r1.1.1.1 -r1.2 *** phttp.monitor 9 Jun 2004 05:18:04 -0000 1.1.1.1 --- phttp.monitor 15 Nov 2004 14:45:19 -0000 1.2 *************** *** 407,411 **** CAVEAT: Do not forget to quote the string. ! use -Dopt to see what you really input. You can use \\n to mean newline. --- 407,411 ---- CAVEAT: Do not forget to quote the string. ! enable -Dopt to see what you really input. You can use \\n to mean newline. *************** *** 435,439 **** CAVEAT: Do not forget to quote the string. ! use -Dopt to see what you really input. --Dgen : print general debug information. --- 435,439 ---- CAVEAT: Do not forget to quote the string. ! enable -Dopt to see what you really input. --Dgen : print general debug information. --- http_t.monitor DELETED --- Index: up_rtt.monitor =================================================================== RCS file: /cvsroot/mon/mon/mon.d/up_rtt.monitor,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -C2 -d -r1.1.1.1 -r1.2 *** up_rtt.monitor 9 Jun 2004 05:18:04 -0000 1.1.1.1 --- up_rtt.monitor 15 Nov 2004 14:45:19 -0000 1.2 *************** *** 8,12 **** # Requires Perl Modules "Time::HiRes" and "Statistics::Descriptive" # ! # # # $Id$ --- 8,12 ---- # Requires Perl Modules "Time::HiRes" and "Statistics::Descriptive" # ! # # # $Id$ *************** *** 48,52 **** use Statistics::Descriptive; ! getopts ("dt:T:l:U:"); # -d Debug mode --- 48,52 ---- use Statistics::Descriptive; ! getopts ("drt:T:l:U:"); # -d Debug mode *************** *** 56,67 **** # -l file Log file name with optional YYYYMM part that will be transformed to current month # -U num Number of UDP packets to send $TimeOut = $opt_t || 10; # Timeout in seconds ! $NUM_UDP_TRYS = $opt_U || 5; # Number of UDP packet to send # Solaris MSG_WAITALL 0x40 /* Wait for complete recv or error */ #linux/socket.h:#define MSG_WAITALL 0x100 /* Wait for a full request */ ! # $RecvRet = recv($S, $Echo, $DataLength, 64); # Solaris & older versions of Linux #$RecvFlags = 0; # May work on all systems due to small packets used here --- 56,68 ---- # -l file Log file name with optional YYYYMM part that will be transformed to current month # -U num Number of UDP packets to send + # -r Log individual raw RTTs $TimeOut = $opt_t || 10; # Timeout in seconds ! $NUM_UDP_TRYS = $opt_U || 5; # Number of UDP packets to send # Solaris MSG_WAITALL 0x40 /* Wait for complete recv or error */ #linux/socket.h:#define MSG_WAITALL 0x100 /* Wait for a full request */ ! # $RecvRet = recv($S, $Echo, $DataLength, 64); # Solaris & older versions of Linux #$RecvFlags = 0; # May work on all systems due to small packets used here *************** *** 81,84 **** --- 82,87 ---- foreach $TargetHost (@Hosts) { + undef @RawRTT; + $stat = Statistics::Descriptive::Full->new(); *************** *** 104,113 **** $count = $stat->count(); ! $ResultString{$TargetHost} = sprintf "%d %s %0.4f %0.4f %0.4f %d", ! $TimeOfDay, $TargetHost, $min, $mean, $max, $count; if ($opt_T) { # Check minimum RTT for alarm limit if ($min > $opt_T) { ! push (@Failures, $TargetHost); } } --- 107,125 ---- $count = $stat->count(); ! if ($opt_r && (defined @RawRTT)) { ! $ResultString{$TargetHost} = sprintf "%d %s", ! $TimeOfDay, $TargetHost; ! foreach $rtt (@RawRTT) { ! $ResultString{$TargetHost} .= sprintf " %0.4f", $rtt; ! } ! } else { ! $ResultString{$TargetHost} = sprintf "%d %s %0.4f %0.4f %0.4f %d", ! $TimeOfDay, $TargetHost, $min, $mean, $max, $count; ! } if ($opt_T) { # Check minimum RTT for alarm limit if ($min > $opt_T) { ! print "Minimum RTT pushing $host\n" if $opt_d; ! push (@Failures, $TargetHost); } } *************** *** 139,149 **** if (@Failures == 0) { # Indicate "all OK" to mon ! exit 0; } ! print "@Failures\n"; ! foreach $host (sort @Failures) { ! print "$host: $ResultString{$host} "; } print "\n"; --- 151,165 ---- if (@Failures == 0) { # Indicate "all OK" to mon ! print "\n--------- No Failures ---------\n" if $opt_d; ! exit 0; } ! print "\n--------- Have Failures - mon Data Below ---------\n" if $opt_d; ! @SortedFailures = sort @Failures; ! print "@SortedFailures\n"; ! ! foreach $host (@SortedFailures) { ! print "$ResultString{$host}\n"; } print "\n"; *************** *** 155,159 **** # ! sub UDPcheck { my($TargetHost) = @_; my($DroppedPackets, $GoodPackets); --- 171,175 ---- # ! sub UDPcheck { # Send multiple UDP packets my($TargetHost) = @_; my($DroppedPackets, $GoodPackets); *************** *** 161,164 **** --- 177,181 ---- $DroppedPackets = 0; $GoodPackets = 0; + $dt = -1; # Will report -1 on failure $S = new IO::Socket::INET (PeerAddr => $TargetHost, *************** *** 167,171 **** ); do { ! &LeaveError($TargetHost, "Can't open UDP socket to $TargetHost\n"); return 0; } unless ($S); --- 184,188 ---- ); do { ! &udpLeaveError($TargetHost, "Can't open UDP socket to $TargetHost\n"); return 0; } unless ($S); *************** *** 177,180 **** --- 194,198 ---- # $Out .= ' 'x52; # Make a 56+ byte packet $DataLength = length($Out); + $Echo = ''; # Clear input buffer $t1 = [gettimeofday]; *************** *** 199,202 **** --- 217,221 ---- if ($Echo eq $Out) { $stat->add_data($dt); + push(@RawRTT, $dt); if ($opt_d) { print "$i - $DataLength - $dt -$Echo-\n"; *************** *** 208,212 **** } } - } } --- 227,230 ---- *************** *** 215,223 **** } ! sub TCPcheck { my($TargetHost) = @_; my($DroppedPackets, $GoodPackets, $dt); $S = new IO::Socket::INET (PeerAddr => $TargetHost, PeerPort => 7, --- 233,245 ---- } ! sub TCPcheck { # Send a single TCP packet my($TargetHost) = @_; my($DroppedPackets, $GoodPackets, $dt); + $GoodPackets = 0; + $i = 1; + $dt = -1; # Will report -1 on failure + $S = new IO::Socket::INET (PeerAddr => $TargetHost, PeerPort => 7, *************** *** 225,235 **** ); do { ! &LeaveError($TargetHost, "Can't open TCP socket to $TargetHost\n"); ! return 0; } unless ($S); - $GoodPackets = 0; - $i = 1; - $dt = -1; # Will report -1 on failure $Out = "TCP$i"; $DataLength = length($Out); --- 247,254 ---- ); do { ! &tcpLeaveError($TargetHost, "Can't open TCP socket to $TargetHost\n"); ! return $GoodPackets, $dt; } unless ($S); $Out = "TCP$i"; $DataLength = length($Out); *************** *** 270,279 **** } ! sub LeaveError { my ($host, $reason) = @_; push (@Failures, $host); - # $ResultString{$host} .= $reason; - # push (@FailReasons, $reason); - # print "$reason\n"; } --- 289,301 ---- } ! sub udpLeaveError { # Don't call this one a failure, TCP might work my ($host, $reason) = @_; + print "udpLeaveError $host\n" if $opt_d; + } + + sub tcpLeaveError { # If we get here, it was a failure + my ($host, $reason) = @_; + print "tcpLeaveError pushing $host\n" if $opt_d; push (@Failures, $host); } Index: file_change.monitor =================================================================== RCS file: /cvsroot/mon/mon/mon.d/file_change.monitor,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -C2 -d -r1.1.1.1 -r1.2 *** file_change.monitor 9 Jun 2004 05:18:05 -0000 1.1.1.1 --- file_change.monitor 15 Nov 2004 14:45:19 -0000 1.2 *************** *** 1,3 **** ! #!/usr/local/bin/perl # # mon monitor to watch for file changes --- 1,3 ---- ! #!/usr/bin/perl # # mon monitor to watch for file changes *************** *** 130,136 **** $StateFile = "$ENV{MON_STATEDIR}/$StateFile"; ! $CI = '/usr/bin/ci'; ! $CI = '/usr/local/bin/ci'; ! #$CI = 'ci'; # Assume that RCS's ci is in the path print "Will use RCS: $RCS\n" if $Debug; --- 130,134 ---- $StateFile = "$ENV{MON_STATEDIR}/$StateFile"; ! $CI = 'ci'; # Assume that RCS's ci is in the path print "Will use RCS: $RCS\n" if $Debug; Index: traceroute.monitor =================================================================== RCS file: /cvsroot/mon/mon/mon.d/traceroute.monitor,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -C2 -d -r1.1.1.1 -r1.2 *** traceroute.monitor 9 Jun 2004 05:18:05 -0000 1.1.1.1 --- traceroute.monitor 15 Nov 2004 14:45:19 -0000 1.2 *************** *** 85,91 **** --- 85,94 ---- RouterList /usr/local/mon/rt.list Traceroute /usr/sbin/traceroute + TracerouteOptions -I StateDir /usr/local/mon/state.d EquivIP 10.22.4.254 10.22.5.254 10.22.6.254 EquivIP 10.28.4.254 10.28.5.254 10.28.6.254 + StopAt 172.30.124.17 A firewall + StopAt 172.31.124.17 Another firewall Lines with '#' in the first column are ignored. *************** *** 104,109 **** Traceroute - Overrides the default of /usr/sbin/traceroute ! StateDir - Overrides the default path of the mon environment variable MON_STATEDIR. ! Files named F<lastroute.router_name> contain the last observed route. EquivIP - A space separated list of IP addresses that should be --- 107,117 ---- Traceroute - Overrides the default of /usr/sbin/traceroute ! TracerouteOptions - Supply additional options to traceroute. -I tells ! traceroute to use ICMP rather than UDP on some systems. Note that -n ! is always supplied so that no DNS lookups are performed. ! ! StateDir - Overrides the default path of the mon environment variable ! MON_STATEDIR. Files named F<lastroute.router_name> contain the last ! observed route. EquivIP - A space separated list of IP addresses that should be *************** *** 112,115 **** --- 120,130 ---- switch interfaces. + StopAt - A single IP address followed by an optional comment. The + traceroute will be terminated when this address is seen. This allows a + route check to a system on another network, such as the Internet, + without tracking the route on a network that you do not control. A + common use would be to put your firewall address in a StopAt + directive. There can be multiple StopAt lines. + =head1 BUGS *************** *** 127,130 **** --- 142,147 ---- use Getopt::Std; + use POSIX qw(:signal_h WNOHANG); + use POSIX qw(strftime); getopts ("vdt:l:c:"); *************** *** 148,152 **** $ConfigFile = $opt_c; ! if (open(C, $ConfigFile)) { while ($in = <C>) { --- 165,169 ---- $ConfigFile = $opt_c; ! if (open(C, $ConfigFile)) { while ($in = <C>) { *************** *** 155,179 **** chomp $in; ! if ($in =~ /^RouteLogFile/i) { ($tag, $LogFile... [truncated message content] |
From: David N. <vi...@us...> - 2004-11-15 14:45:30
|
Update of /cvsroot/mon/mon/etc In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv9218/etc Added Files: mon.cgi.cf snmpopt.cf snmpvar.cf snmpvar.def syslog-monitor.conf Log Message: Pulling lots of changes from the 1.0.0pre* branch into the HEAD, to prepare to tag mon-1.1pre1 --- NEW FILE: syslog-monitor.conf --- # Configuration file for syslog.monitor # $Id: syslog-monitor.conf,v 1.2 2004/11/15 14:45:18 vitroth Exp $ ############################################################################# # Which timeout to set for select()ing on the input socket. # You really do not wish to play with this. # select_timeout 10 # Log level (just like syslog you know;) loglevel 6 # If undefined, will write to stdout # You better specify an absolute path here. # logfile /var/log/syslog.monitor # Where copies of incoming syslog messages get written to. # In the filename, you can define the following substitutions: # %H = gets replaced with the hostname # %L = gets replaced with the syslog level as a string # %l = same, but as a number # %F = syslog facility (local0, kern, ...) # %G = hostgroup the host belongs to # %D = date at which the message was received, in ISO 8601 (1999-04-03) syslogfile /var/log/syslog.%H.%F.%D # If set, will make syslog.monitor fork and go into the background as soon # as possible. # Be aware that the program will refuse to daemonize if you do not set a logfile. # daemon_mode mon_host cherusker.bi.teuto.net # Set these if necessary # mon_user # mon_pass # IP number on which to listen for incomeing UDP packets bind_ip 0.0.0.0 # port number (you almost certainly do not want to touch this) # bind_port 514 # Define a check called "emerg" check emerg # A slightly more elaborate description, which is sent to the mon server # as part of the trap desc Emergencies # The period which is monitored period 60m # How often this check _must_ trigger within said period. # Set to -1 to disable. min -1 # How often this check might occur at max within the period. max 3 # If this is set, no further matches will be checked if this check matched. # Use this carefully. # final # The check itself. Evaluated within Perl (), you can do powerful stuff # here. The current message is referenced by $$r. # Parameters you might want to match on: # $$r{'src_port'} - The source port from which the packert was sent. # $$r{'src_ip'} - The source IP. # $$r{'host'} - The hostname, resolved using the cache build # at startup. # $$r{'level'} - numeric syslog level of the message. (0-7) # $$r{'Level'} - syslog level as a string (ie 'crit') # $$r{'facility'} - Facility (ie 'local0' etc) # $$r{'msg'} - The text part of the message # $$r{'time'} - The unixtime at which the message was received, # $$r{'group'} - The group the host sending this message # belongs to pattern ($$r{'level'} <=3) # A "catch-all" - we really should receive at least one line within 15m, # But more than 1000 might be strange... check all desc All period 15m min 200 max 10000 final pattern (1) # Relating to hostgroup unix: group unix # For each host in the hostgroup unix, run a separate instance of each # check listed here (references the check defined above) per-host emerg # For the _entire_ hostgroup, run these checks: per-group all # Only on this host, run these: # on-host donar.bi.teuto.net emerg-kern --- NEW FILE: mon.cgi.cf --- # # The mon.cgi config file. # Format: # key = value # # Blank lines and lines that begin with '#' are ignored. # # Both key names and values are case sensitive. # # This file comes with the mon.cgi distribution and contains all of the # valid key/value pairs that mon.cgi will accept. # # The latest version of mon.cgi is always available at: # http://www.nam-shub.com/files/ # # If there are errors in your config file, mon.cgi will stop parsing it, # and will print messages to STDERR, which should end up in your web # server's error log. # # $Id: mon.cgi.cf,v 1.2 2004/11/15 14:45:18 vitroth Exp $ # # Your organization (what you want printed on the top of each page) organization = Network Operations # Contact email for mon administrator at your site monadmin = bo...@yo...main #Company or mon logo (URL path) logo = /URL-path/to/your.gif # URL to go to when you click on the logo image logo_link = http://www.kernel.org/pub/software/admin/mon/html/ # Seconds between page reload reload_time = 180 # Where to run mon (host,port) monhost = localhost monport = 2583 # Set this to anything other than 'Y' or 'yes' to turn off authentication # (HINT: authentication is a *good* thing) must_login = yes # Application secret. Set this to something long and unguessable. app_secret = LKAHETOI#KJHJKSHDOWOIUW^*((985i2hkljlkjfdhglkdhfgdlkfjghldksfjhg98 34tklh qrthq3 i3lu4 KLHKLJHKLJH ncxmvn owow y YnneO87210502673kn6l3 # Default username and password (only used if must_login is set) default_username = readonly default_password = public # Idle time, in seconds, until login cookie is invalidated. Note that if # ( login_expire_time < reload_time ) you will not be able to "idle". login_expire_time = 900 # Whether or not to untaint HTML in ack msgs using HTML::Entities (recommended) untaint_ack_msgs = yes # The name of the cookie set by mon.cgi and its path cookie_name = mon-cookie cookie_path = / # Default alternate fonts to use (assumes default font is a serif font) fixed_font_face = courier sans_serif_font_face = Helvetica, Arial # Default color scheme for page BGCOLOR = black TEXTCOLOR = white LINKCOLOR = yellow VLINKCOLOR = #00FFFF # Default colors for failed services greenlight_color = #009900 redlight_color = red unchecked_color = #000033 yellowlight_color = #FF9933 # # A white-background look for mon.cgi, from Thomas Bates <cb...@tv...> # #BGCOLOR = #FFFFFF #TEXTCOLOR = #000000 #LINKCOLOR = 0000FF #VLINKCOLOR = #551a8b # #greenlight_color=#a0d0a0 #redlight_color=ff6060 #unchecked_color=f0f0f0 #disabled_color=#e0e0e0 #yellowlight_color = #FFAF4F # Maximum number of downtime events to show, per page dtlog_max_failures_per_page = 100 # Watch keywords will show only the specified hostgroups by default. # Matching is by regexp. # e.g., show the watch whose name is www #watch = www # e.g., show any watches whose names start with gw- #watch = gw-.* # Set show_watch_strict to 'yes' if you want to be sure that users only # information about the hostgroups that they are authorized to # view. If show_watch_strict is set to 1, as far as your GUI users # will know, there is nothing else running on the mon instance # except for their hostgroups, *even if those users know the names # of other hostgroups on your mon server*. # # Set to show_watch_strict to 'no' to show only the defined watch # groups by default, but allow users to see information about # others as well. show_watch_strict = no --- NEW FILE: snmpopt.cf --- # # snmpopt.cf # # This optional file is used to pass parameters to the SNMP library, # used by snmpvar.monitor. # # (default values shown) # common options # Version = 1 # Port = 161 # Retries = 8 # Timeout = 5 # SNMPv1/v2 options # Community = public # SNMPv3 options # SecName = initial # SecLevel = noAuthNoPriv # AuthPass = # SecEngineId = # ContextEngineId = # Context # AuthProto = MD5 # PrivProto = DES # PrivPass = --- NEW FILE: snmpvar.cf --- # # snmpvar.cf # # this is a sample configuration file for snmpvar.monitor. you # must configure this to meet your own needs. # # list of variables and ranges to be monitored by snmpvar.monitor # refers to variables defined in snmpvar.def # # a Dell server, RAID instrumentation only: Host nov-1 MEGARAID0_LOGICAL_STATUS Min 2 Max 2 Index 0 MEGARAID0_PHYS_STATUS Min 3 Max 3 Index 0 1 2 3 4 5 # a Compaq server: Host nov-2 # has 1 RAID volume, 6 physical disks CPQARRAY_LOG_STATUS Index 1 CPQARRAY_PHYS_STATUS Index 0 1 2 3 4 5 PROLIANT_TEMP_STATUS PROLIANT_PSU_STATUS PROLIANT_FAN_STATUS Index 2 4 5 # a Dell server running NT 4 with perfmib Host ntserv1 WINNT_MEM_COMMITTED Max 700 WINNT_LOGICAL_C_FREE Min 50 WINNT_LOGICAL_D_FREE Min 50 MEGARAID_C0_LOGICAL_STATUS Index 0 MEGARAID_C0_CH0_PHYS_STATUS Index 0 1 2 3 4 PE4300_TEMP_CPU PE4300_TEMP PE4300_5V_CURRENT PE4300_12V_CURRENT PE4300_3V_CURRENT PE4300_FAN_CPU_RPM PE4300_FAN_DISK_RPM PE4X00_PSU_STATUS # an APC UPS (with SNMP adapter or through controlling server running PowerNet) Host srvups1 APCUPS_OUTPUT_STAT APCUPS_LINEVOLT_MAX APCUPS_LINEVOLT_MIN # here, we override the default maximum specified in snmpvar.def: APCUPS_LOAD Max 75 APCUPS_BATT_TEMP # these are the MeasureUPS parameters (external sensor) APCUPS_EXT_TEMP Max 32 APCUPS_EXT_HUMID Min 10 Max 90 APCUPS_EXT_SWITCH_STAT Min 2 Max 2 Index 1 FriendlyName 1 Diesel Generator Status # an HP ProCurve 4000 switch Host hp4000-servers HP_ICF_FAN_STATE # has redundant PSU HP_ICF_PSU_STATE Index 2 3 IF_OPERSTAT Index 1 3 17 25 65 73 FriendlyName 1 A1: Server LAUREL FriendlyName 3 A3: Server HARDY FriendlyName 17 C1: Server TITAN (1000SX) FriendlyName 25 D1: Server MERCURY (1000SX) FriendlyName 65 I1: Switch D1017:G1 (1000TX) FriendlyName 73 J1: Switch SERVERS1:H1 (1000SX) # an IBM8272 Token Ring switch Host trsw1 IBM8272_LINK_STATE Min 1 Max 1 Index 1 2 3 4 5 6 7 9 11 12 13 14 15 16 17 18 21 22 23 24 FriendlyName 1 1: Floor 10 Ring FriendlyName 2 2: Floor 12 Ring FriendlyName 3 3: Floor 13 Ring FriendlyName 9 9: Server NOV-1 FriendlyName 13 13: Server ntserv1 FriendlyName 18 18: Switch 2 Interlink Fibre IBM8272_TEMP_SYS Min 1 Max 1 # a cisco router Host cisco1 IF_OPERSTAT Index 1 2 3 4 FriendlyName 1 1: Internal Ethernet FriendlyName 2 2: Internal TokenRing FriendlyName 3 3: Firewall BGP_PEERSTATE Index 10.1.1.1 10.2.1.1 FriendlyName 10.1.1.1 iBGP Session: myotherrouter FriendlyName 10.2.1.1 eBGP Session: Provider X CISCO_TEMP_STATE # a Nokia IP series firewall appliance Host firewall IF_OPERSTAT Index 1 2 3 FriendlyName 1 1: Leased Line FriendlyName 2 2: DMZ FriendlyName 3 3: Internal Router NOKIA_IP_CHASSIS_TEMP NOKIA_IP_FAN_STAT NOKIA_IP_PSU_STAT NOKIA_IP_PSU_TEMP # a Linux server with some private SNMP extensions Host mailserver LINUX_MAILQUEUE Max 80 --- NEW FILE: snmpvar.def --- # # sample snmpvar.def. you should configure this to meet your # own needs. # # Definitions of variables to be monitored using snmpvar.monitor # # # generic host (router/switch/...) Variable IF_OPERSTAT OID .1.3.6.1.2.1.2.2.1.8 Description ifOperStatus DefaultEQ 1 Decode 1 up Decode 2 down Decode 3 testing Decode 4 unknown Decode 5 dormant # generic router Variable BGP_PEERSTATE OID .1.3.6.1.2.1.15.3.1.2 Description bgpPeerState DefaultEQ 6 Decode 1 idle Decode 2 connect Decode 3 active Decode 4 opensent Decode 5 openconfirm Decode 6 established # generic Host Resources MIB implementation Variable HR_DEVICE_STATUS OID .1.3.6.1.2.1.25.3.2.1.5. Description Device Status DefaultEQ 2 Decode 1 unknown Decode 2 running Decode 3 warning Decode 4 testing Decode 5 down # some variables from a Windows NT "perfmib" configuration # see ms-perfmib directory for NT side configuration Variable WINNT_CPU_TOTAL OID .1.3.6.1.4.1.311.1.1.3.1.1.1.9.0 Description CPU Load Total Unit % Variable WINNT_CPU_SYS OID .1.3.6.1.4.1.311.1.1.3.1.1.1.11.0 Description CPU Load System Unit % Variable WINNT_MEM_COMMITTED OID .1.3.6.1.4.1.311.1.1.3.1.1.2.2.0 Description Committed Memory Scale / 1024 / 1024 # the Scale expression is used as (eval($rawval . $scale)) Unit MB Variable WINNT_MEM_AVAILABLE OID .1.3.6.1.4.1.311.1.1.3.1.1.2.1.0 Description Available Memory Scale / 1024 /1024 Unit MB Variable WINNT_LOGICAL_C_FREE OID .1.3.6.1.4.1.311.1.1.3.1.1.6.1.4.6.48.58.48.58.67.58 Description Free Disk Space on drive C Unit MB Variable WINNT_LOGICAL_D_FREE OID .1.3.6.1.4.1.311.1.1.3.1.1.6.1.4.6.48.58.48.58.68.58 Description Free Disk Space on drive D Unit MB # Dell PowerEdge 2550 Server Instrumentation Variable PE2550_FAN_SYS_RPM OID .1.3.6.1.4.1.674.10892.1.700.12.1.6.1. Description System Fan Speed DefaultIndex 1 2 3 Unit rpm DefaultMin 600 DefaultMax 6000 DefaultMaxValid 10000 DefaultGroup Environment Variable PE2550_FAN_DISK_RPM OID .1.3.6.1.4.1.674.10892.1.700.12.1.6.1.4 Description Disk Fan Speed Unit rpm DefaultMin 6000 DefaultMax 14000 DefaultMaxValid 15000 DefaultGroup Environment Variable PE2550_TEMP_CPU OID .1.3.6.1.4.1.674.10892.1.700.20.1.6.1. Description CPU Temperature DefaultIndex 1 2 Unit C Scale / 10.0 DefaultMax 50 DefaultGroup Environment Variable PE2550_TEMP OID .1.3.6.1.4.1.674.10892.1.700.20.1.6.1. Description Temperature DefaultIndex 3 4 5 FriendlyName 3 Motherboard FriendlyName 4 Backplane 1 FriendlyName 5 Backplane 2 Unit C Scale / 10.0 DefaultMax 40 DefaultGroup Environment Variable PE2550_PSU_STATUS DefaultIndex 1 2 OID .1.3.6.1.4.1.674.10892.1.600.12.1.5.1. Description Power Supply Status DefaultEQ 3 Decode 1 other Decode 2 unknown Decode 3 OK Decode 4 noncrit Decode 5 critical Decode 6 nonrecoverable DefaultGroup Power # Dell PowerEdge 4300 Server Instrumentation Variable PE4300_TEMP_CPU OID .1.3.6.1.4.1.674.10891.300.1.5.2.2.1. Description CPU Temperature DefaultIndex 1 2 Scale / 10.0 Unit C DefaultMax 40 DefaultGroup Environment Variable PE4300_TEMP OID .1.3.6.1.4.1.674.10891.300.1.5.2.2.1. Description Temperature DefaultIndex 3 4 5 6 FriendlyName 3 @Motherboard FriendlyName 4 @Ambient FriendlyName 5 @Backplane 1 FriendlyName 6 @Backplane 2 Scale / 10.0 Unit C DefaultMax 40 DefaultGroup Environment Variable PE4300_5V_CURRENT OID .1.3.6.1.4.1.674.10891.303.1.5.2.5.1. Description DC Current (+5V) DefaultIndex 1 4 7 Scale / 1000.0 Unit A DefaultMax 25 DefaultMaxValid 100 DefaultGroup Power Variable PE4300_12V_CURRENT OID .1.3.6.1.4.1.674.10891.303.1.5.2.5.1. Description DC Current (+12V) DefaultIndex 2 5 8 Scale / 1000.0 Unit A DefaultMax 10 DefaultMaxValid 100 DefaultGroup Power Variable PE4300_3V_CURRENT OID .1.3.6.1.4.1.674.10891.303.1.5.2.5.1. Description DC Current (+3V) DefaultIndex 3 6 9 Scale / 1000.0 Unit A DefaultMax 10 DefaultMaxValid 100 DefaultGroup Power Variable PE4300_FAN_CPU_RPM OID .1.3.6.1.4.1.674.10891.301.1.5.2.3.1. Description CPU Fan Speed Unit rpm DefaultIndex 1 2 DefaultMin 1000 DefaultMax 5000 DefaultMaxValid 10000 DefaultGroup Environment # really the same as above, other index ranges only; different description # one could also make it an array and use FriendlyName in the .cf file Variable PE4300_FAN_DISK_RPM OID .1.3.6.1.4.1.674.10891.301.1.5.2.3.1. Description Disk Fan Speed Unit rpm DefaultIndex 3 4 5 DefaultMin 1000 DefaultMax 5000 DefaultMaxValid 10000 DefaultGroup Environment Variable PE4X00_PSU_STATUS DefaultIndex 1 2 3 OID .1.3.6.1.4.1.674.10891.304.1.4.2.6.1. Description Power Supply Status DefaultEQ 3 Decode 1 other Decode 2 unknown Decode 3 OK Decode 4 noncrit Decode 5 critical Decode 6 nonrecoverable DefaultGroup Power Variable PE4X00_EXT_DISK1_PSU_STATUS DefaultIndex 1 2 OID .1.3.6.1.4.1.674.10891.304.1.4.2.6.2. Description ExtStorage 1 PSU Status DefaultEQ 3 Decode 1 other Decode 2 unknown Decode 3 OK Decode 4 noncrit Decode 5 critical Decode 6 nonrecoverable DefaultGroup Power # Dell PowerEdge 6350 Server Instrumentation Variable PE6350_TEMP_CPU OID .1.3.6.1.4.1.674.10891.300.1.5.2.2.1. Description CPU Temperature DefaultIndex 1 2 3 4 Scale / 10.0 Unit C DefaultMax 55 DefaultGroup Environment Variable PE6350_TEMP OID .1.3.6.1.4.1.674.10891.300.1.5.2.2.1. Description Temperature DefaultIndex 5 6 7 FriendlyName 5 @Motherboard FriendlyName 6 @Ambient FriendlyName 7 @Backplane Scale / 10.0 Unit C DefaultMax 40 DefaultGroup Environment Variable PE6350_TEMP_EXT_DISK1 OID .1.3.6.1.4.1.674.10891.300.1.5.2.2.2.1 Description ExtStorage 1 Temperature Scale / 10.0 Unit C DefaultGroup Environment Variable PE6350_FAN_RPM OID .1.3.6.1.4.1.674.10891.301.1.5.2.3.1. Description Fan Speed DefaultIndex 1 2 3 4 Unit rpm DefaultMin 1000 DefaultMax 5000 DefaultMaxValid 10000 DefaultGroup Environment Variable PE6350_FAN_RPM_EXT_DISK1 OID .1.3.6.1.4.1.674.10891.301.1.5.2.3.2. Description ExtStorage 1 Fan Speed DefaultIndex 1 2 3 Unit rpm DefaultMin 1000 DefaultMax 5000 DefaultMaxValid 10000 DefaultGroup Environment # Dell PowerEdge 4200 Server Instrumentation Variable PE4200_TEMP_CPU OID .1.3.6.1.4.1.674.10891.300.1.5.2.2.1. Description CPU Temperature DefaultIndex 1 2 Scale / 10.0 Unit C DefaultMax 40 DefaultGroup Environment Variable PE4200_TEMP OID .1.3.6.1.4.1.674.10891.300.1.5.2.2.1. Description Temperature DefaultIndex 3 4 5 6 FriendlyName 3 @Ambient FriendlyName 4 @Panel FriendlyName 5 @Backplane Top FriendlyName 6 @Backplane Bottom Scale / 10.0 Unit C DefaultMax 35 DefaultGroup Environment Variable PE4200_PSU_5V_CURRENT OID .1.3.6.1.4.1.674.10891.303.1.5.2.5.1. Description DC Current (+5V) DefaultIndex 1 2 FriendlyName 1 @Top PSU FriendlyName 2 @Bottom PSU Scale / 1000.0 Unit A DefaultMax 10 DefaultMaxValid 50 DefaultGroup Power Variable PE4200_PSU_3V_CURRENT OID .1.3.6.1.4.1.674.10891.303.1.5.2.5.1. Description DC Current (+3.3V) DefaultIndex 3 4 FriendlyName 3 @Top PSU FriendlyName 4 @Bottom PSU Scale / 1000.0 Unit A DefaultMax 5 DefaultMaxValid 50 DefaultGroup Power Variable PE4200_PSU_12V_CURRENT OID .1.3.6.1.4.1.674.10891.303.1.5.2.5.1. Description DC Current (+12V) DefaultIndex 5 6 FriendlyName 5 @Top PSU FriendlyName 6 @Bottom PSU Scale / 1000.0 Unit A DefaultMax 10 DefaultMaxValid 50 DefaultGroup Power Variable PE4200_FAN_RPM OID .1.3.6.1.4.1.674.10891.301.1.5.2.3.1. Description Fan Speed Unit rpm DefaultIndex 1 3 4 5 # Fan #2 is a standby unit FriendlyName 1 @Chassis 1 FriendlyName 2 @Chassis 2 FriendlyName 3 @Chassis 3 FriendlyName 4 @Top PSU FriendlyName 5 @Bottom PSU DefaultMin 1000 DefaultMax 5000 DefaultMaxValid 10000 DefaultGroup Environment # AMI MegaRAID (aka Dell PERC) RAID controller instrumentation Variable MEGARAID_C0_LOGICAL_STATUS OID .1.3.6.1.4.1.3582.1.1.2.1.3.0. Description RAID Ctl0 Volume Status DefaultEQ 2 Decode 0 offline Decode 1 degraded Decode 2 normal Decode 3 initialize Decode 4 checkconsistency Variable MEGARAID_C1_LOGICAL_STATUS OID .1.3.6.1.4.1.3582.1.1.2.1.3.1. Description RAID Ctl1 Volume Status DefaultEQ 2 Decode 0 offline Decode 1 degraded Decode 2 normal Decode 3 initialize Decode 4 checkconsistency Variable MEGARAID_C0_CH0_PHYS_STATUS OID .1.3.6.1.4.1.3582.1.1.3.1.4.0.0. Description Ctl0Ch0 Phys Drive Status DefaultEQ 3 Decode 1 ready Decode 3 online Decode 4 failed Decode 5 rebuild Decode 6 hotspare Decode 20 nondisk Variable MEGARAID_C1_CH0_PHYS_STATUS OID .1.3.6.1.4.1.3582.1.1.3.1.4.1.0. Description Ctl1Ch0 Phys Drive Status DefaultEQ 3 Decode 1 ready Decode 3 online Decode 4 failed Decode 5 rebuild Decode 6 hotspare Decode 20 nondisk Variable MEGARAID_C1_CH1_PHYS_STATUS OID .1.3.6.1.4.1.3582.1.1.3.1.4.1.1. Description Ctl1Ch1 Phys Drive Status DefaultEQ 3 Decode 1 ready Decode 3 online Decode 4 failed Decode 5 rebuild Decode 6 hotspare Decode 20 nondisk # APC SmartUPS monitoring (using PowerNet SNMP agents or SNMP adapter boards) Variable APCUPS_LINEVOLT_MAX OID .1.3.6.1.4.1.318.1.1.1.3.2.2.0 Description Recent Max Line Voltage Unit V DefaultMax 245 DefaultGroup Power Variable APCUPS_LINEVOLT_MIN OID .1.3.6.1.4.1.318.1.1.1.3.2.3.0 Description Recent Min Line Voltage Unit V DefaultMin 205 DefaultGroup Power Variable APCUPS_LOAD OID .1.3.6.1.4.1.318.1.1.1.4.2.3.0 Description Output Load Unit % DefaultMax 90 DefaultGroup Power Variable APCUPS_BATT_TEMP OID .1.3.6.1.4.1.318.1.1.1.2.2.2.0 Description Battery Temperature Unit C DefaultMax 45 DefaultGroup Environment # external sensors connected to a MeasureUPS board Variable APCUPS_EXT_TEMP OID .1.3.6.1.4.1.318.1.1.2.1.1.0 Description Temperature Unit C DefaultGroup Environment Variable APCUPS_EXT_HUMID OID .1.3.6.1.4.1.318.1.1.2.1.2.0 Description Humidity Unit % DefaultMin 10 DefaultMax 90 DefaultGroup Environment Variable APCUPS_EXT_SWITCH_STAT OID .1.3.6.1.4.1.318.1.1.2.2.2.1.5 Description Contact Decode 1 unknown Decode 2 OK Decode 3 FAULT Variable APCUPS_OUTPUT_STAT OID .1.3.6.1.4.1.318.1.1.1.4.1.1.0 Description UPS Status DefaultEQ 2 Decode 1 unknown Decode 2 Online Decode 3 On Battery Decode 4 On Smart Boost Decode 5 Timed Sleeping Decode 6 Software Bypass Decode 7 Off Decode 8 Rebooting Decode 9 Switched Bypass Decode 10 Hardware Failure Bypass Decode 11 Sleeping Until Power Return Decode 12 On Smart Trim DefaultGroup Power # Compaq ProLiant Server Instrumentation Variable PROLIANT_TEMP_STATUS OID .1.3.6.1.4.1.232.6.2.6.3.0 Description Temperature Status DefaultEQ 2 Decode 1 Other Decode 2 OK Decode 3 Degraded Decode 4 FAILED DefaultGroup Environment Variable PROLIANT_FAN_STATUS OID .1.3.6.1.4.1.232.6.2.6.7.1.9.0. Description Fan Status DefaultEQ 2 Decode 1 Other Decode 2 OK Decode 3 Degraded Decode 4 FAILED DefaultGroup Environment Variable PROLIANT_PSU_STATUS OID .1.3.6.1.4.1.232.6.2.9.3.1.5.0. Description Power Supply Status DefaultIndex 1 2 DefaultEQ 1 Decode 1 OK Decode 2 Failure Decode 3 BIST Failure Decode 4 Fan Failure Decode 5 Temp Failure Decode 6 Interlock Open DefaultGroup Power Variable CPQARRAY_LOG_STATUS OID .1.3.6.1.4.1.232.3.2.3.1.1.4.1. Description RAID Volume Status DefaultIndex 1 DefaultEQ 2 Decode 1 Other Decode 2 OK Decode 3 FAILED Decode 4 Unconfigured Decode 5 Recovering Decode 6 Ready For Rebuild Decode 7 Rebuilding Decode 8 Wrong Drive Decode 9 Bad Connect Decode 10 Overheating Decode 11 Shutdown Decode 12 expanding Decode 13 Not Available Decode 14 Queued For Expansion Variable CPQARRAY_PHYS_STATUS OID .1.3.6.1.4.1.232.3.2.5.1.1.6.1. Description Phys Drive Status DefaultEQ 2 Decode 1 Other Decode 2 OK Decode 3 Failed Decode 4 Predictive Failure # IBM 8272 Token Ring switch Variable IBM8272_LINK_STATE OID .1.3.6.1.4.1.2.6.66.1.2.2.1.1.15. Description Link State DefaultEQ 1 Decode 1 up Decode 2 down Variable IBM8272_TEMP_SYS OID .1.3.6.1.4.1.2.6.66.1.2.1.2.11.0 Description Switch Temperature DefaultEQ 1 Decode 1 normal Decode 2 HIGH DefaultGroup Environment # Nokia IP series firewall appliance Variable NOKIA_IP_CHASSIS_TEMP OID .1.3.6.1.4.1.94.1.21.1.1.5.0 Description Chassis Temperature DefaultEQ 1 Decode 1 normal Decode 2 OVERTEMP DefaultGroup Environment Variable NOKIA_IP_FAN_STAT OID .1.3.6.1.4.1.94.1.21.1.2.1.1.2. Description Fan Status DefaultEQ 1 Decode 1 running Decode 2 DEAD DefaultGroup Environment Variable NOKIA_IP_PSU_STAT OID .1.3.6.1.4.1.94.1.21.1.3.1.1.3. Description PSU Status DefaultEQ 1 Decode 1 running Decode 2 DEAD DefaultGroup Environment Variable NOKIA_IP_PSU_TEMP OID .1.3.6.1.4.1.94.1.21.1.3.1.1.2. Description Chassis Temperature DefaultEQ 1 Decode 1 normal Decode 2 OVERTEMP DefaultGroup Environment # Mail Server (custom extension scripts in UCD SNMP agent) Variable LINUX_MAILQUEUE OID .1.3.6.1.4.1.2021.8.1.101.1 Description Mail Queue Length # see sample in ucd-snmp subdir in snmpvar.monitor distribution # cisco router # ciscoEnvMonTemperatureState Variable CISCO_TEMP_STATE OID .1.3.6.1.4.1.9.9.13.1.3.1.6. Description Chassis Temperature DefaultIndex 1 DefaultEQ 1 Decode 1 normal Decode 2 Warning Decode 3 CRITICAL Decode 4 SHUTDOWN Decode 5 not present DefaultGroup Environment Variable CISCO_MEM_POOL_FREE OID .1.3.6.1.4.1.9.9.48.1.1.1.6. Description Memory Pool Free Bytes DefaultIndex 1 2 FriendlyName 1 CPU FriendlyName 2 I/O # HP switch # hpicfSensorStatus Variable HP_ICF_FAN_STATE OID .1.3.6.1.4.1.11.2.14.11.1.2.6.1.4.1 Description Fan Status DefaultEQ 4 Decode 1 unknown Decode 2 bad Decode 3 warning Decode 4 good Decode 5 not present DefaultGroup Environment Variable HP_ICF_PSU_STATE OID .1.3.6.1.4.1.11.2.14.11.1.2.6.1.4. Description PSU Status DefaultEQ 4 Decode 1 unknown Decode 2 bad Decode 3 warning Decode 4 good Decode 5 not present DefaultGroup Power |
From: David N. <vi...@us...> - 2004-11-15 14:45:28
|
Update of /cvsroot/mon/mon/utils In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv9218/utils Added Files: syslog.monitor Log Message: Pulling lots of changes from the 1.0.0pre* branch into the HEAD, to prepare to tag mon-1.1pre1 --- NEW FILE: syslog.monitor --- (This appears to be a binary file; contents omitted.) |
From: David N. <vi...@us...> - 2004-11-15 14:45:27
|
Update of /cvsroot/mon/mon/clients In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv9218/clients Modified Files: moncmd monremote.pl monshow Added Files: mon.cgi Log Message: Pulling lots of changes from the 1.0.0pre* branch into the HEAD, to prepare to tag mon-1.1pre1 Index: moncmd =================================================================== RCS file: /cvsroot/mon/mon/clients/moncmd,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -C2 -d -r1.1.1.1 -r1.2 *** moncmd 9 Jun 2004 05:18:07 -0000 1.1.1.1 --- moncmd 15 Nov 2004 14:45:17 -0000 1.2 *************** *** 216,243 **** Valid commands are: ! quit ! reset [stopped] ! term ! list group "groupname" ! list disabled list alerthist list failurehist - list successes list failures list opstatus list pids list watch - stop - start loadstate ! savestate set "group" "service" "variable" "value" ! get "group" "service" "variable" ! disable service "group" "service" ! disable host "host" ["host"...] ! disable watch "watch" ! enable service "group" "service" ! enable host "host" ["host"...] ! enable watch "watch" EOF exit 0; --- 216,261 ---- Valid commands are: ! ack "watch" "service" comment ! checkauth cmd [args] ! clear "watch" "service" ! disable host "host" ["host"...] ! disable service "group" "service" ! disable watch "watch" ! dump ! enable host "host" ["host"...] ! enable service "group" "service" ! enable watch "watch" ! get "group" "service" "variable" list alerthist + list aliases + list aliasgroups + list deps + list descriptions + list disabled + list dtlog list failurehist list failures + list group "groupname" list opstatus list pids + list state + list successes + list warnings list watch loadstate ! protid ! quit ! reload ! reset [stopped] [keepstate] ! savestate disabled ! servertime set "group" "service" "variable" "value" ! start ! stop ! term ! test config ! test monitor "watch" "service" ! test {alert|startupalert|upalert} "watch" "service" "retval" "period" ! version EOF exit 0; --- NEW FILE: mon.cgi --- #!/usr/bin/perl -T #!/usr/bin/perl -Tw broke when I made changes to list_dtlog that involved # submitting three commas ",,," in a row into the value of $args :( # # NAME # mon.cgi # # # DESCRIPTION # Web interface for the Mon resource monitoring system. mon.cgi # implements a significant subset of the Perl interface to Mon, which # allows administrators to quickly view the status of their network # and perform many common Mon tasks with a simple web client. # # Requires mon 0.38-21 and Mon::Client 0.11 for proper operation. # # # AUTHORS # Originally by: [...3807 lines suppressed...] # inside &moncgi_custom_commands. # # moncgi_custom_commands returns non-zero if it finds # a command to execute; } else { # All else. &setup_page("Operation Status: Summary View"); &query_opstatus("summary"); } $webpage->print("<hr>"); # # Some stuff we keep around for debugging # #print "commands is $command, args is $args<br>\n"; #DEBUG #print $webpage->dump; #DEBUG &end_page; $c->disconnect(); Index: monshow =================================================================== RCS file: /cvsroot/mon/mon/clients/monshow,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -C2 -d -r1.1.1.1 -r1.2 *** monshow 9 Jun 2004 05:18:07 -0000 1.1.1.1 --- monshow 15 Nov 2004 14:45:17 -0000 1.2 *************** *** 584,593 **** if ($dd == 0) { ! sprintf("%02d:%02d", $hh, $mm); } else { ! sprintf("%d days, %02d:%02d", $dd, $hh, $mm); } } --- 584,593 ---- if ($dd == 0) { ! sprintf("%02d:%02d:%02d", $hh, $mm, $ss); } else { ! sprintf("%d days, %02d:%02d:%02d", $dd, $hh, $mm, $ss); } } *************** *** 1497,1504 **** # ! # 0 = nothing special ! # 1 = do not display if zero ! # 2 = do not display if eq "" # foreach my $k ( ["opstatus", "Operational Status", 0], --- 1497,1516 ---- # ! # VAR: ! # variable name from "show opstatus" ! # ! # DESCR: ! # display name for variable ! # ! # IFZERO: ! # 0 = nothing special ! # 1 = do not display if zero ! # 2 = do not display if eq "" ! # ! # TYPE: ! # s = seconds ! # b = boolean # + my ($VAR, $DESCR, $IFZERO, $TYPE) = (0..3); foreach my $k ( ["opstatus", "Operational Status", 0], *************** *** 1512,1537 **** ["first_failure", "First Failure", 2], ["failure_duration", "Failure Duration", 2], ! ["interval", "Schedule Interval", 0], ["exclude_period", "Exclude Period", 2], ["exclude_hosts", "Exclude Hosts", 2], ! ["randskew", "Random Skew", 1], ["alerts_sent", "Alerts Sent", 1], ! ["last_alert", "Last Alert", 2]) { my $v = undef; ! if ($d->{$k->[0]} ne "") { ! $v = \$d->{$k->[0]}; ! } elsif ($sref->{$k->[0]} ne "") { ! $v = \$sref->{$k->[0]}; } ! next if ($k->[2] == 1 && $$v == 0); ! next if ($k->[2] == 2 && $$v eq ""); $OUT_BUF .= <<EOF; <tr> ! <td align=right width="15%"><b>$k->[1]:</b></td> <td> $$v </td> EOF --- 1524,1571 ---- ["first_failure", "First Failure", 2], ["failure_duration", "Failure Duration", 2], ! ["interval", "Schedule Interval", 0, "s"], ["exclude_period", "Exclude Period", 2], ["exclude_hosts", "Exclude Hosts", 2], ! ["randskew", "Random Skew", 1, "s"], ["alerts_sent", "Alerts Sent", 1], ! ["last_alert", "Last Alert", 2], ! ["monitor_duration", "Monitor Execution Duration", 2, "s"], ! ["monitor_running", "Monitor currently running", 0, "b"], ! ) { my $v = undef; ! if ($d->{$k->[$VAR]} ne "") { ! $v = \$d->{$k->[$VAR]}; ! } elsif ($sref->{$k->[$VAR]} ne "") { ! $v = \$sref->{$k->[$VAR]}; } ! # ! # convert types into display form ! # ! if ($k->[$TYPE] eq "s") ! { ! if ($$v >= 0) ! { ! $$v = secs_to_hms ($$v); ! } ! } ! ! elsif ($k->[$TYPE] eq "b") ! { ! $$v = $$v == 0 ? "false" : "true"; ! } ! ! # ! # display if zero? ! # ! next if ($k->[$IFZERO] == 1 && $$v == 0); ! next if ($k->[$IFZERO] == 2 && $$v eq ""); $OUT_BUF .= <<EOF; <tr> ! <td align=right width="15%"><b>$k->[$DESCR]:</b></td> <td> $$v </td> EOF Index: monremote.pl =================================================================== RCS file: /cvsroot/mon/mon/clients/monremote.pl,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** monremote.pl 14 Jun 2004 10:57:14 -0000 1.1 --- monremote.pl 15 Nov 2004 14:45:17 -0000 1.2 *************** *** 55,59 **** if (!defined $ARGV[0]) { ! print "Usage: monremote.pl (enable|disable) (watch <groupname>|host <hostname>|service <group> <service>)\n"; exit; } --- 55,59 ---- if (!defined $ARGV[0]) { ! print "Usage: monremote.pl (enable|disable|test) (watch <groupname>|host <hostname>|service <group> <service>)\n"; exit; } |
From: David N. <vi...@us...> - 2004-11-15 14:45:26
|
Update of /cvsroot/mon/mon/cgi-bin In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv9218/cgi-bin Removed Files: README Log Message: Pulling lots of changes from the 1.0.0pre* branch into the HEAD, to prepare to tag mon-1.1pre1 --- README DELETED --- |
From: Jim T. <tr...@us...> - 2004-10-06 16:32:01
|
Update of /cvsroot/mon/mon/mon.d In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv2562/mon.d Modified Files: ntpdate.monitor Log Message: show detail when there are no failures, also show raw output of "ntpdate -q", redirect stderr from it to stdout so the errors aren't missed. Index: ntpdate.monitor =================================================================== RCS file: /cvsroot/mon/mon/mon.d/ntpdate.monitor,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** ntpdate.monitor 16 Sep 2004 15:02:19 -0000 1.2 --- ntpdate.monitor 6 Oct 2004 16:31:51 -0000 1.3 *************** *** 174,184 **** print "TimeOfDay: $TimeOfDay\n" if $Debug; ! $cmd = qq{$NTPDATE -q @Hosts |}; $pid = open(NTP, $cmd) || die "Couldn't run $cmd\n"; while ($in = <NTP>) { # print $in if $Debug; chomp $in; # # Pick out server strings --- 174,190 ---- print "TimeOfDay: $TimeOfDay\n" if $Debug; ! $cmd = qq{$NTPDATE -q @Hosts 2>&1 |}; $pid = open(NTP, $cmd) || die "Couldn't run $cmd\n"; + $detail = ""; + $ntpdate_output = ""; + while ($in = <NTP>) { # print $in if $Debug; + $ntpdate_output .= $in; + chomp $in; + # # Pick out server strings *************** *** 192,195 **** --- 198,203 ---- print "$in Name: $name Stratum: $stratum\n" if $Debug; + $detail .= "$in Name: $name Stratum: $stratum\n"; + if (exists $NameByIP{$ip}) { # Use system name if we have it $HostName = $NameByIP{$ip}; *************** *** 232,236 **** $fail_string = ' '; ! $fail_string = ' '; if (($Stratum{$hostname} > $MaxStratum) || ($Stratum{$hostname} < $MinStratum) || (abs($DeltaTime) > $MaxOffset)) { --- 240,244 ---- $fail_string = ' '; ! if (($Stratum{$hostname} > $MaxStratum) || ($Stratum{$hostname} < $MinStratum) || (abs($DeltaTime) > $MaxOffset)) { *************** *** 241,244 **** --- 249,253 ---- $fail_string = 'Fail'; } + $FmtDetail .= "\n"; *************** *** 252,257 **** --- 261,269 ---- } + print "\n$FmtDetail\n" if $Debug; + $detail .= "\n$FmtDetail\n"; + # # Write results to logfile, if -l *************** *** 281,288 **** } ! if ($Debug) { ! foreach $ip (sort keys %LogString) { ! print "LOG: $LogString{$ip}\n"; ! } } --- 293,299 ---- } ! foreach $ip (sort keys %LogString) { ! print "LOG: $LogString{$ip}\n" if ($Debug); ! $detail .= "LOG: $LogString{$ip}\n"; } *************** *** 290,293 **** --- 301,306 ---- if (@Failures == 0) { # Indicate "all OK" to mon + print "\n$detail"; + print "\nntpdate -q output:\n\n$ntpdate_output"; exit 0; } *************** *** 316,320 **** print "------- Details -------\n" if $Debug; ! print $FmtDetail; #foreach $hostname (sort keys %FailureDetail) { --- 329,333 ---- print "------- Details -------\n" if $Debug; ! print $detail; #foreach $hostname (sort keys %FailureDetail) { *************** *** 322,327 **** --- 335,345 ---- #} + print "\nntpdate -q output:\n\n$ntpdate_output"; + exit 1; # Indicate failure to mon + ############################################################################## + + # # Get the IP addresses for the hosts (because ntpdate returns IP addresses) |
From: Jim T. <tr...@us...> - 2004-10-06 16:29:36
|
Update of /cvsroot/mon/mon/alert.d In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv1945/alert.d Modified Files: mail.alert Log Message: added -f to set the from: in header and envelope, patch from Hans-Dieter Karl <hd...@hd...> Index: mail.alert =================================================================== RCS file: /cvsroot/mon/mon/alert.d/mail.alert,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -C2 -d -r1.1.1.1 -r1.2 *** mail.alert 9 Jun 2004 05:18:07 -0000 1.1.1.1 --- mail.alert 6 Oct 2004 16:29:19 -0000 1.2 *************** *** 6,9 **** --- 6,11 ---- # to a pager or email subject line. # + # -f from@addr.x set the smtp envelope "from" address + # # Jim Trocki, tr...@tr... # *************** *** 30,34 **** use Text::Wrap; ! getopts ("S:s:g:h:t:l:u"); $summary=<STDIN>; --- 32,36 ---- use Text::Wrap; ! getopts ("S:s:g:h:t:l:f:u"); $summary=<STDIN>; *************** *** 38,41 **** --- 40,44 ---- $mailaddrs = join (',', @ARGV); + $mailfrom = "-f $opt_f -F $opt_f" if (defined $opt_f); $ALERT = $opt_u ? "UPALERT" : "ALERT"; *************** *** 44,48 **** ($wday,$mon,$day,$tm) = split (/\s+/, $t); ! open (MAIL, "| /usr/lib/sendmail -oi -t") || die "could not open pipe to mail: $!\n"; print MAIL <<EOF; --- 47,51 ---- ($wday,$mon,$day,$tm) = split (/\s+/, $t); ! open (MAIL, "| /usr/lib/sendmail -oi -t $mailfrom") || die "could not open pipe to mail: $!\n"; print MAIL <<EOF; *************** *** 59,63 **** Group : $opt_g Service : $opt_s - Description : $ENV{MON_DESCRIPTION} Time noticed : $t Secs until next alert : $opt_l --- 62,65 ---- |
From: Jim T. <tr...@us...> - 2004-09-16 15:02:29
|
Update of /cvsroot/mon/mon/mon.d In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18328 Added Files: ntpdate.monitor Log Message: A mon monitor to verify that ntp is running on multiple servers, those servers have synchronized time, and that the times are within specified limits. --- NEW FILE: ntpdate.monitor --- #!/usr/bin/perl # # ntpdate.monitor Verify that NTP is running and times are within tolerance # ntpdate will do most of the work for us # =head1 NAME B<ntpdate.monitor> - ntp monitor using ntpdate to do most of the work =head1 DESCRIPTION A mon monitor to verify that ntp is running on multiple servers, those servers have synchronized time, and that the times are within specified limits. The mon server should be running ntp since the times are reported relative to the system performing the query. =head1 SYNOPSIS B<ntpdate.monitor -d -l log_file_YYYYMM.log --maxstratum nn --maxoffset n.nn> =head1 OPTIONS =over 5 =item B<--maxstratum> Maximum stratum number, default is 10. Stratum 16 indicates that ntp is running on a system, but the clock is not synchronized. An alarm will be triggered if this value is exceeded. =item B<--maxoffset> Maximum value of the clock offset in seconds, default is 800 ms (a large value, ntp typically keeps clocks within milliseconds of each other). An alarm will be triggered if this value is exceeded. =item B<-l log_file_template> or B<--log log_file_template> /path/to/logs/internet_web_YYYYMM.log Current year & month are substituted for YYYYMM, that is the only possible template at this time. The format of the log file is: time server stratum offset delay time is in UNIX seconds, offset, and delay are in seconds. =item B<-shortalerts> Use only hostname in alert list. For organizations with long FQDNs this will make mail and pager alerts more readable. =item B<--htmlfile /full/path/to/file.html> Optional location to write the formated results from the current test. Be sure that the directory is writeable by the user under whom mon is running. =item B<-d> or B<--debug> Debug/Test/Verbose, for manual testing only. =item B<--ntpdate> Specify the location of ntpdate, the default is /usr/sbin/ntpdate =back =head1 MON CONFIGURATION EXAMPLE hostgroup ntp ntp1.somedomain.org ntp2.somedomain.org ntp3.somedomain.org watch ntp service ntpdate interval 30m monitor ntpdate.monitor --maxoffset 0.100 --log /usr/local/mon/logs/gv-ntp-YYYYMM.log period wd {Sun-Sat} alert mail.alert us...@so... alertevery 1h summary =head1 BUGS Listing a server twice can cause ntpdate to report that server as Stratum 0. The shortalerts option only reports the hostname, it could be extended to provide a configurable number of FQND fields. ntpdate will be removed from the NTP distribution at some point. This monitor will need to be modified to use some form of ntpd -q instead. Check the first line of this file to be sure that it points to an appropriate perl executable. =head1 AUTHOR Jon Meek, me...@ie... =head1 SEE ALSO ntp.monitor by Daniel Hagerty <ha...@li...> =cut $RCSid = q{$Id: ntpdate.monitor,v 1.2 2004/09/16 15:02:19 trockij Exp $ }; # # Jon Meek # Lawrenceville, NJ # meekj at ieee.org # # # $Id: ntpdate.monitor,v 1.2 2004/09/16 15:02:19 trockij Exp $ # # Copyright (C) 2002, Jon Meek # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Getopt::Long; GetOptions( "maxstratum=i" => \$MaxStratum, "maxoffset=f" => \$MaxOffset, # "dns" => \$UseDNS, "d|debug" => \$Debug, "l=s" => \$LogFile, "log=s" => \$LogFile, "htmlfile=s" => \$HtmlFile, "shortalerts" => \$ShortAlerts, "ntpdate=s" => \$NTPDATE, ); use Net::DNS; use Sys::Hostname; use POSIX qw(strftime); # # Set Defaults # # ntpdate reports stratum 16 if ntp is running, but time is not synchronized # stratum 0 will be reported if ntp is not running # $MaxStratum = 10 unless $MaxStratum; $MinStratum = 1; # Use the first occurrence of this stratum as the reference time for alarms $ReferenceStratum = 1 unless $ReferenceStratum; # # Trigger alarm if the time is ever off by this much # $MaxOffset = 0.800 unless $MaxOffset; # seconds $NTPDATE = '/usr/sbin/ntpdate' unless $NTPDATE; $HtmlFileHandle = &HTMLheader($HtmlFile) if ($HtmlFile ne "" && HTMLheader); @Failures = (); @Hosts = @ARGV; # Host names are left on the command line after Getopt %NameByIP = &DNSlookups(\@Hosts); $TimeOfDay = time; # Current time print "TimeOfDay: $TimeOfDay\n" if $Debug; $cmd = qq{$NTPDATE -q @Hosts |}; $pid = open(NTP, $cmd) || die "Couldn't run $cmd\n"; while ($in = <NTP>) { # print $in if $Debug; chomp $in; # # Pick out server strings # if ($in =~ /^server\s+([\d\.]+),\s+stratum\s+(\d+),\s+offset\s+([\d\.\-\+]+),\s+delay\s+([\d\.\-\+]+)/) { $ip = $1; $stratum = $2; $offset = $3; $delay = $4; $name = $NameByIP{$ip}; print "$in Name: $name Stratum: $stratum\n" if $Debug; if (exists $NameByIP{$ip}) { # Use system name if we have it $HostName = $NameByIP{$ip}; } else { $HostName = $ip; # Otherwise use IP address } $IP{$HostName} = $ip; $Stratum{$HostName} = $stratum; $Offset{$HostName} = $offset; $Delay{$HostName} = $delay; $Detail{$HostName} = $in; if ((!defined $ReferenceOffset) && ($stratum == 1)) { # Save offset from first stratum 1 server seen $ReferenceOffset = $offset; } # # Prepare log entries # if ($LogFile or $Debug) { $LogString{$HostName} = qq{$TimeOfDay $HostName $stratum $offset $delay}; } } } # # Build formatted results and check alarm limits # $FmtDetail = qq{NTP Server Delta, s Stratum Rel, s Offset, s\n}; &HTMLtableHeader($HtmlFileHandle, 'NTP Server', 'Delta, s', 'Stratum', 'Rel, s', 'Offset, s', 'Status') if ($HtmlFile ne ""); foreach $hostname (sort keys %Stratum) { $DeltaTime = $Offset{$hostname} - $ReferenceOffset; $DeltaTimeByHost{$hostname} = $DeltaTime; $FmtDetail .= sprintf ("%-40s %12.6f %3d %12.6f %12.6f", $hostname, $DeltaTime, $Stratum{$hostname}, $Offset{$hostname}, $Delay{$hostname}); $fail_string = ' '; $fail_string = ' '; if (($Stratum{$hostname} > $MaxStratum) || ($Stratum{$hostname} < $MinStratum) || (abs($DeltaTime) > $MaxOffset)) { $ip = $IP{$hostname}; $FailureDetail{$hostname} = $Detail{$hostname}; push(@Failures, $hostname); $FmtDetail .= q{ Fail}; $fail_string = 'Fail'; } $FmtDetail .= "\n"; if ($HtmlFile ne "") { $fDeltaTime = sprintf("%12.6f", $DeltaTime); $fOffset = sprintf("%12.6f", $Offset{$hostname}); $fDelay = sprintf("%12.6f", $Delay{$hostname}); &HTMLtableRow($HtmlFileHandle, $hostname, $fDeltaTime, $Stratum{$hostname}, $fOffset, $fDelay, $fail_string); } } print "\n$FmtDetail\n" if $Debug; # # Write results to logfile, if -l # if ($LogFile) { $LogFile = $LogFile; ($sec, $min, $hour, $mday, $Month, $Year, $wday, $yday, $isdst) = localtime($TimeOfDay); $Month++; $Year += 1900; $YYYYMM = sprintf('%04d%02d', $Year, $Month); $LogFile =~ s/YYYYMM/$YYYYMM/; # Fill in current year and month if (-e $LogFile) { # Check for existing log file $NewLogFile = 0; } else { $NewLogFile = 1; } open(LOG, ">>$LogFile") || warn "$0 Can't open logfile: $LogFile\n"; foreach $ip (sort keys %LogString) { print LOG "$LogString{$ip}\n"; } close LOG; } if ($Debug) { foreach $ip (sort keys %LogString) { print "LOG: $LogString{$ip}\n"; } } &HTMLtrailer($HtmlFileHandle) if $HtmlFile; if (@Failures == 0) { # Indicate "all OK" to mon exit 0; } # # Otherwise we have one or more failures # if ($ShortAlerts) { foreach $host (sort @Failures) { if ($host =~ /^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/) { # IP address, don't shorten push(@SortedFailures, $host); } else { $host =~ /(.*?)\./; push(@SortedFailures, $1); } } } else { @SortedFailures = sort @Failures; } print "------- Have Failures -------\n" if $Debug; print "@SortedFailures\n"; print "------- Details -------\n" if $Debug; print $FmtDetail; #foreach $hostname (sort keys %FailureDetail) { # print "$NameByIP{$hostname} $hostname $FailureDetail{$hostname} $DeltaTimeByHost{$hostname} s\n"; #} exit 1; # Indicate failure to mon # # Get the IP addresses for the hosts (because ntpdate returns IP addresses) # sub DNSlookups { my ($Hosts) = @_; $res = new Net::DNS::Resolver; for (my $i = 0; $i < @$Hosts; $i++) { $target = $Hosts->[$i]; $query = $res->search($target); if ($query) { foreach $rr ($query->answer) { #print "$target Type: ", $rr->type, "\n" if $Debug; if ($rr->type eq "A") { print $rr->address . ' ' if $Debug; $NameByIP{$rr->address} = $target; } } } } return %NameByIP; } sub HTMLheader { # # Print basic standard header for this application # my($FileName) = @_; local *F; open(F, ">$FileName") || warn "$$ can't open $FileName, check permissions"; $Title = "NTP Server Status"; $MonitorHostname = hostname; $FmtTimeNow = strftime("%A %d-%b-%Y %H:%M:%S %Z", localtime(time)); print F <<"EndOfHeader"; <HTML> <HEAD> <TITLE>$Title</TITLE> </HEAD> <BODY bgcolor="#ffffff" text="#000000"> <H1>$Title from $MonitorHostname</H1> <p>$FmtTimeNow</p> <table border=2 cellpadding=3> EndOfHeader return *F; } sub HTMLtableHeader { my($FileHandle, @Headers) = @_; print $FileHandle "<TR>\n"; foreach $h (@Headers) { print $FileHandle "<TH>$h</TH>\n"; } print $FileHandle "</TR>\n"; } sub HTMLtableRow { my ($FileHandle, @Fields) = @_; my ($align, $f); $align = ''; print $FileHandle "<TR>\n"; foreach $f (@Fields) { print $FileHandle "<TD$align>$f</TD>\n"; $align = ' align=right'; } print $FileHandle "</TR>\n"; } sub HTMLtrailer { # # Print basic standard trailer for this application # my($FileHandle) = @_; print $FileHandle "</table>\n</body>\n</html>\n"; close $FileHandle; } |
From: Jim T. <tr...@us...> - 2004-08-13 04:15:32
|
Update of /cvsroot/mon/mon In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv9747 Modified Files: Tag: mon-1-0-0pre1 mon Log Message: carp about duplicate services and periods Index: mon =================================================================== RCS file: /cvsroot/mon/mon/mon,v retrieving revision 1.4.2.12 retrieving revision 1.4.2.13 diff -C2 -d -r1.4.2.12 -r1.4.2.13 *** mon 11 Aug 2004 20:40:27 -0000 1.4.2.12 --- mon 13 Aug 2004 04:15:16 -0000 1.4.2.13 *************** *** 1169,1172 **** --- 1169,1178 ---- } + elsif (exists $new_watch{$watchgroup}->{$service}) + { + close (CFG); + return "cf error: service $service already defined for watch group $watchgroup, line $line_num"; + } + $period = 0; $sref = \%{$new_watch{$watchgroup}->{$service}}; *************** *** 1241,1244 **** --- 1247,1256 ---- } + if (exists $sref->{"periods"}->{$periodstr}) + { + close (CFG); + return "cf error: period '$periodstr' already defined for watch group $watchgroup service $service, line $line_num"; + } + $pref = \%{$sref->{"periods"}->{$periodstr}}; |
From: Jim T. <tr...@us...> - 2004-08-13 04:15:32
|
Update of /cvsroot/mon/mon/doc In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv9747/doc Modified Files: Tag: mon-1-0-0pre1 mon.8 Log Message: carp about duplicate services and periods Index: mon.8 =================================================================== RCS file: /cvsroot/mon/mon/doc/mon.8,v retrieving revision 1.1.1.1.2.3 retrieving revision 1.1.1.1.2.4 diff -C2 -d -r1.1.1.1.2.3 -r1.1.1.1.2.4 *** mon.8 3 Aug 2004 15:58:09 -0000 1.1.1.1.2.3 --- mon.8 13 Aug 2004 04:15:17 -0000 1.1.1.1.2.4 *************** *** 916,919 **** --- 916,921 ---- .B service followed by a word which is the tag for this service. + This word must be unique among all services defined for the + same watch group. The components of a service are an interval, monitor, and *************** *** 1107,1111 **** The .B period ! keyword has two forms. The first takes an argument which is a period specification from Patrick Ryan's --- 1109,1113 ---- The .B period ! definition has two forms. The first takes an argument which is a period specification from Patrick Ryan's *************** *** 1127,1130 **** --- 1129,1138 ---- parameters. + Period definitions, in either the first or second form, must be unique within + each service definition. For example, if you need to define two + periods both for "wd {Sun-Sat}", then one or both of the period definitions + must specify a label such as "period t1: wd {Sun-Sat}" and + "period t2: wd {Sun-Sat}". + .TP .BI alertevery " timeval [observe_detail | strict]" |
From: Jim T. <tr...@us...> - 2004-08-11 20:41:21
|
Update of /cvsroot/mon/mon In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv15265 Modified Files: Tag: mon-1-0-0pre1 TODO Log Message: some thoughts on logging support Index: TODO =================================================================== RCS file: /cvsroot/mon/mon/TODO,v retrieving revision 1.1.1.1.2.1 retrieving revision 1.1.1.1.2.2 diff -C2 -d -r1.1.1.1.2.1 -r1.1.1.1.2.2 *** TODO 3 Aug 2004 15:58:09 -0000 1.1.1.1.2.1 --- TODO 11 Aug 2004 20:41:13 -0000 1.1.1.1.2.2 *************** *** 53,54 **** --- 53,85 ---- -make it possible to disable just one of multiple alarms in a service + + -make a logging facility which forks and execs external logging + daemons and writes to them via some ipc such as unix domain socket. + mon should be sure that one of each type of these loggers is running + at all times. configure the logging either globally or for each + service. write both the success and failure status to the log in + some "list opstatus" type format. each logger can do as it wishes + with the data (e.g. stuff it into rrdtool, mysql, cat it to a file, etc.) + + + # global setting + logger = file + + watch stuff + service http + logger file -p _LOGDIR_ + ... + service fping + # this will use the global logger setting + ... + service + # this will override the global logger setting + logger none + ... + + + common options to logger: + -d dir path to logging dir + -f file name of log file + -g, -s group, service + |
From: Jim T. <tr...@us...> - 2004-08-11 20:40:36
|
Update of /cvsroot/mon/mon In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv15011 Modified Files: Tag: mon-1-0-0pre1 mon Log Message: don't quote output of "list descriptions" Index: mon =================================================================== RCS file: /cvsroot/mon/mon/mon,v retrieving revision 1.4.2.11 retrieving revision 1.4.2.12 diff -C2 -d -r1.4.2.11 -r1.4.2.12 *** mon 3 Aug 2004 15:44:42 -0000 1.4.2.11 --- mon 11 Aug 2004 20:40:27 -0000 1.4.2.12 *************** *** 2173,2179 **** foreach $group (keys %watch) { foreach $service (keys %{$watch{$group}}) { ! sock_write ($fh, "$group $service '" . esc_str ($watch{$group}->{$service}->{"description"}, 1) . ! "'\n"); } } --- 2173,2179 ---- foreach $group (keys %watch) { foreach $service (keys %{$watch{$group}}) { ! sock_write ($fh, "$group $service " . esc_str ($watch{$group}->{$service}->{"description"}, 1) . ! "\n"); } } |
From: Jim T. <tr...@us...> - 2004-08-03 15:58:26
|
Update of /cvsroot/mon/mon In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14509 Modified Files: Tag: mon-1-0-0pre1 CHANGES TODO Log Message: Index: CHANGES =================================================================== RCS file: /cvsroot/mon/mon/CHANGES,v retrieving revision 1.2.2.2 retrieving revision 1.2.2.3 diff -C2 -d -r1.2.2.2 -r1.2.2.3 *** CHANGES 12 Jul 2004 13:17:07 -0000 1.2.2.2 --- CHANGES 3 Aug 2004 15:58:09 -0000 1.2.2.3 *************** *** 1,4 **** --- 1,36 ---- $Id$ + Changes between mon-1.0.0pre3 and mon-1.0.0pre4 + Tue Aug 3 08:02:35 EDT 2004 + ----------------------------------------------- + + -when allow_empty_group is not set and no host arguments + to pass to a monitor, the interval wasn't being reset so + it would spam the syslog with lots of "no host arguments" + messages. this is fixed. + + -in reset_timer, there was a chance that _timer could get + set to a negative value, which is not right. fixed it. + + -fixed the bug where lots of mon processes could accumulate if the + exec of an alert failed. also fixed error handling of failed + alerts. + + -added "show failures only" button to mon.cgi to speed it up. + by Ed Ravin <er...@pa...> + + -small permissions fix to rpm spec file + + -added MON_CFBASEDIR variable to monitor and alert + environment, which is set to the value of "cfbasedir" in the + config file. + + -removed unfinished snmp trap handling stuff. it doesn't work at all, + and it's misleading to people even though the man page says it doesn't + work. + + -added monitor_duration and monitor_running output to opstatus detail + in monshow + Changes between mon-1.0.0pre1 and mon-1.0.0pre3 Mon Jul 12 09:12:29 EDT 2004 Index: TODO =================================================================== RCS file: /cvsroot/mon/mon/TODO,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.2.1 diff -C2 -d -r1.1.1.1 -r1.1.1.1.2.1 *** TODO 9 Jun 2004 05:18:03 -0000 1.1.1.1 --- TODO 3 Aug 2004 15:58:09 -0000 1.1.1.1.2.1 *************** *** 1,31 **** $Id$ ! -fix problem where all members of hostgroup are disabled ! and it spams the log every second: ! ! Feb 24 04:02:36 mon-bd2 mon[11668]: monitor for ATE/fping not called because of no host arguments ! ! ! -pass cfbasedir as env vars to monitors and alerts ! just like logdir ! ! -include snmpvar.monitor in the main dist ! ! -add short "trap howto" and "radius howto" posts to the mon ! list in the doc/ directory. -make traps authenticate via the same scheme used to obscure the password in RADIUS packets - -have an absolute "alertevery" which squelches alerts - regardless of whether or not the service goes down/up/down. - -descriptions defined in mon.cf should be 'quoted' ! -document command section, trap section, snmp trap section in authfile ! -there should be only one routine which handles dealing with failures ! and successes (including calling alerts), probably the routine which tests ! $? in proc_cleanup. -output to client should be buffered and incorporated into the I/O loop. --- 1,14 ---- $Id$ ! -add short a "radius howto" to the doc/ directory. -make traps authenticate via the same scheme used to obscure the password in RADIUS packets -descriptions defined in mon.cf should be 'quoted' ! -document command section and trap section in authfile ! -finish support for receiving snmp traps -output to client should be buffered and incorporated into the I/O loop. *************** *** 41,48 **** -document "clear" client command - -Separate the decision-making code about sending alerts into a - separate routine, and make do_alert do nothing but deliver the - actual alert. - -Document trap authentication. --- 24,27 ---- |
From: Jim T. <tr...@us...> - 2004-08-03 15:58:26
|
Update of /cvsroot/mon/mon/doc In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14509/doc Modified Files: Tag: mon-1-0-0pre1 mon.8 Log Message: Index: mon.8 =================================================================== RCS file: /cvsroot/mon/mon/doc/mon.8,v retrieving revision 1.1.1.1.2.2 retrieving revision 1.1.1.1.2.3 diff -C2 -d -r1.1.1.1.2.2 -r1.1.1.1.2.3 *** mon.8 9 Jul 2004 03:18:53 -0000 1.1.1.1.2.2 --- mon.8 3 Aug 2004 15:58:09 -0000 1.1.1.1.2.3 *************** *** 317,320 **** --- 317,327 ---- global configuration variable. + .TP + .B MON_CFBASEDIR + The directory where configuration files should be kept, + as indicated by the + .I cfbasedir + global configuration variable. + .P "fping.monitor" should return an exit status of 0 if it *************** *** 506,509 **** --- 513,523 ---- global configuration variable. + .TP + .B MON_CFBASEDIR + The directory where configuration files should be kept, + as indicated by the + .I cfbasedir + global configuration variable. + .P The first line from standard input must be used as a brief summary *************** *** 670,677 **** .TP - .BI "snmpport = " portnum - Set the SNMP port that the server binds to. - - .TP .BI "serverbind = " addr --- 684,687 ---- *************** *** 688,695 **** .TP - .BI "snmp =" {yes|no} - Turn on/off SNMP support (currently unimplemented). - - .TP .BI "dtlogfile = " file .I file --- 698,701 ---- |
From: Jim T. <tr...@us...> - 2004-08-03 15:56:04
|
Update of /cvsroot/mon/mon/clients In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv13898 Modified Files: Tag: mon-1-0-0pre1 monshow Log Message: added "monitor currently running" (boolean) and "monitor execution duration" to opstatus detail Index: monshow =================================================================== RCS file: /cvsroot/mon/mon/clients/monshow,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.2.1 diff -C2 -d -r1.1.1.1 -r1.1.1.1.2.1 *** monshow 9 Jun 2004 05:18:07 -0000 1.1.1.1 --- monshow 3 Aug 2004 15:55:53 -0000 1.1.1.1.2.1 *************** *** 584,593 **** if ($dd == 0) { ! sprintf("%02d:%02d", $hh, $mm); } else { ! sprintf("%d days, %02d:%02d", $dd, $hh, $mm); } } --- 584,593 ---- if ($dd == 0) { ! sprintf("%02d:%02d:%02d", $hh, $mm, $ss); } else { ! sprintf("%d days, %02d:%02d:%02d", $dd, $hh, $mm, $ss); } } *************** *** 1497,1504 **** # ! # 0 = nothing special ! # 1 = do not display if zero ! # 2 = do not display if eq "" # foreach my $k ( ["opstatus", "Operational Status", 0], --- 1497,1516 ---- # ! # VAR: ! # variable name from "show opstatus" ! # ! # DESCR: ! # display name for variable ! # ! # IFZERO: ! # 0 = nothing special ! # 1 = do not display if zero ! # 2 = do not display if eq "" ! # ! # TYPE: ! # s = seconds ! # b = boolean # + my ($VAR, $DESCR, $IFZERO, $TYPE) = (0..3); foreach my $k ( ["opstatus", "Operational Status", 0], *************** *** 1512,1537 **** ["first_failure", "First Failure", 2], ["failure_duration", "Failure Duration", 2], ! ["interval", "Schedule Interval", 0], ["exclude_period", "Exclude Period", 2], ["exclude_hosts", "Exclude Hosts", 2], ! ["randskew", "Random Skew", 1], ["alerts_sent", "Alerts Sent", 1], ! ["last_alert", "Last Alert", 2]) { my $v = undef; ! if ($d->{$k->[0]} ne "") { ! $v = \$d->{$k->[0]}; ! } elsif ($sref->{$k->[0]} ne "") { ! $v = \$sref->{$k->[0]}; } ! next if ($k->[2] == 1 && $$v == 0); ! next if ($k->[2] == 2 && $$v eq ""); $OUT_BUF .= <<EOF; <tr> ! <td align=right width="15%"><b>$k->[1]:</b></td> <td> $$v </td> EOF --- 1524,1571 ---- ["first_failure", "First Failure", 2], ["failure_duration", "Failure Duration", 2], ! ["interval", "Schedule Interval", 0, "s"], ["exclude_period", "Exclude Period", 2], ["exclude_hosts", "Exclude Hosts", 2], ! ["randskew", "Random Skew", 1, "s"], ["alerts_sent", "Alerts Sent", 1], ! ["last_alert", "Last Alert", 2], ! ["monitor_duration", "Monitor Execution Duration", 2, "s"], ! ["monitor_running", "Monitor currently running", 0, "b"], ! ) { my $v = undef; ! if ($d->{$k->[$VAR]} ne "") { ! $v = \$d->{$k->[$VAR]}; ! } elsif ($sref->{$k->[$VAR]} ne "") { ! $v = \$sref->{$k->[$VAR]}; } ! # ! # convert types into display form ! # ! if ($k->[$TYPE] eq "s") ! { ! if ($$v >= 0) ! { ! $$v = secs_to_hms ($$v); ! } ! } ! ! elsif ($k->[$TYPE] eq "b") ! { ! $$v = $$v == 0 ? "false" : "true"; ! } ! ! # ! # display if zero? ! # ! next if ($k->[$IFZERO] == 1 && $$v == 0); ! next if ($k->[$IFZERO] == 2 && $$v eq ""); $OUT_BUF .= <<EOF; <tr> ! <td align=right width="15%"><b>$k->[$DESCR]:</b></td> <td> $$v </td> EOF |
From: Jim T. <tr...@us...> - 2004-08-03 15:44:52
|
Update of /cvsroot/mon/mon In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11885 Modified Files: Tag: mon-1-0-0pre1 mon Log Message: removed snmp stuff which doesn't work and won't work any time soon call_alert should not call waitpid, it only needs to call close, since perl's close will call waitpid for us. calling waitpid after closing a filehandle opened with open(FH, "|-") will always return an error, which is what it was doing. added MON_CFBASEDIR var to monitor and alert environments Index: mon =================================================================== RCS file: /cvsroot/mon/mon/mon,v retrieving revision 1.4.2.10 retrieving revision 1.4.2.11 diff -C2 -d -r1.4.2.10 -r1.4.2.11 *** mon 2 Aug 2004 19:47:31 -0000 1.4.2.10 --- mon 3 Aug 2004 15:44:42 -0000 1.4.2.11 *************** *** 54,59 **** use Time::HiRes qw(gettimeofday tv_interval usleep); use Time::Period; - use Mon::SNMP; - #use SNMP in read_cf() sub auth; --- 54,57 ---- *************** *** 83,87 **** sub gen_scriptdir_hash; sub handle_io; - sub handle_snmp_trap; sub handle_trap; sub handle_trap_timeout; --- 81,84 ---- *************** *** 165,169 **** my %NOAUTHCMDS; my %AUTHTRAPS; - my %AUTHSNMPTRAPS; # --- 162,165 ---- *************** *** 971,986 **** } - } elsif ($1 eq "snmp") { - if ($2 =~ /^1|yes|on|true$/i) { - $new_CF{"SNMP"} = 1; - eval "use SNMP"; - if ($@ ne "") { - close (CFG); - return "cf error: could not use SNMP: $@"; - } - } else { - $new_CF{"SNMP"} = 0; - } - } elsif ($1 eq "monerrfile") { $new_CF{"MONERRFILE"} = $2; --- 967,970 ---- *************** *** 995,1001 **** } - } elsif ($1 eq "snmpport") { - $new_CF{"SNMPPORT"} = $2; - } elsif ($1 eq "dep_recur_limit") { $new_CF{"DEP_RECUR_LIMIT"} = $2; --- 979,982 ---- *************** *** 1769,1784 **** configure_filehandle (*TRAPSERVER) || die_die ("err", "could not configure UDP trap port: $!"); - - return if (!$CF{"SNMP"}); - - # - # SNMP traps - # - socket (SNMPSERVER, PF_INET, SOCK_DGRAM, $udpproto) || - die_die ("err", "could not create UDP socket: $!"); - bind (SNMPSERVER, sockaddr_in ($CF{"SNMPPORT"}, INADDR_ANY)) || - die_die ("err", "could not bind UDP server port: $!"); - configure_filehandle (*SNMPSERVER) || - die_die ("err", "could not configure UDP SNMP port: $!"); } --- 1750,1753 ---- *************** *** 3216,3219 **** --- 3185,3189 ---- $ENV{"MON_STATEDIR"} = $CF{"STATEDIR"}; $ENV{"MON_LOGDIR"} = $CF{"LOGDIR"}; + $ENV{"MON_CFBASEDIR"} = $CF{"CFBASEDIR"}; if (!exec @args) *************** *** 3580,3584 **** %NOAUTHCMDS = (); %AUTHTRAPS = (); - %AUTHSNMPTRAPS = (); $sect = "command"; --- 3550,3553 ---- *************** *** 3600,3606 **** $sect = "trap"; next; - } elsif ($l =~ /^snmp trap section/) { - $sect = "snmptrap"; - next; } --- 3569,3572 ---- *************** *** 3655,3668 **** $AUTHTRAPS{$host}{$user} = $password; - } elsif ($sect eq "snmptrap") { - - if ($l !~ /^(\S+)\s+(\S+)$/) { - syslog ('err', "invalid line in $CF{AUTHFILE}, line $."); - next; - } - - ($host, $password) = ($1, $2); - $AUTHSNMPTRAPS{$host}{$password} = 1; - } else { syslog ('err', "unknown section in $CF{AUTHFILE}: $l"); --- 3621,3624 ---- *************** *** 3754,3793 **** # - # handle SNMP trap - # - sub handle_snmp_trap { - my ($buf, $from) = @_; - my ($port, $addr, $fromip); - my (%traphash); - - ($port, $addr) = sockaddr_in ($from); - $fromip = inet_ntoa ($addr); - - if (!defined ($AUTHSNMPTRAPS{$fromip})) { - syslog ('err', "got SNMP trap from unauthorized agent: $fromip"); - return undef; - } - - $TRAP_PDU->buffer ($buf); - %traphash = $TRAP_PDU->decode; - - if (! keys %traphash) { - syslog ('err', "error decoding SNMP trap: " . $TRAP_PDU->error); - return undef; - } - - if ($AUTHSNMPTRAPS{$fromip} ne - crypt ($traphash{"community"}, $traphash{"community"})) { - syslog ('err', "unauthorized community from agent: $fromip"); - return undef; - } - - # - # here's the real meat - # - } - - - # # handle a trap # --- 3710,3713 ---- *************** *** 4027,4031 **** vec ($iovec, fileno (TRAPSERVER), 1) = 1; vec ($iovec, fileno (SERVER), 1) = 1; - vec ($iovec, fileno (SNMPSERVER), 1) = 1 if ($CF{"SNMP"}); foreach my $cl (keys %clients) { vec ($iovec, $cl, 1) = 1; --- 3947,3950 ---- *************** *** 4056,4071 **** # - # SNMP trap - # - } elsif ($CF{"SNMP"} && vec ($niovec, fileno (SNMPSERVER), 1)) { - my ($from, $trapbuf); - if (!defined ($from = recv (SNMPSERVER, $trapbuf, 65536, 0))) { - syslog ('err', "error trying to recv an SNMP trap: $!"); - } else { - handle_snmp_trap ($trapbuf, $from); - } - next; - - # # client connections # --- 3975,3978 ---- *************** *** 4336,4341 **** $CF{"CLIENTALLOW"} = '\d+.\d+.\d+.\d+'; $CF{"MAXPROCS"} = 0; - $CF{"SNMP"} = 0; - $CF{"SNMPPORT"} = 34000; $CF{"HISTORICFILE"} = ""; $CF{"HISTORICTIME"} = 0; --- 4243,4246 ---- *************** *** 4412,4417 **** %MONITORHASH = (); %ALERTHASH = (); - - $TRAP_PDU = new Mon::SNMP; } --- 4317,4320 ---- *************** *** 4593,4596 **** --- 4496,4500 ---- $ENV{"MON_STATEDIR"} = $CF{"STATEDIR"}; $ENV{"MON_LOGDIR"} = $CF{"LOGDIR"}; + $ENV{"MON_CFBASEDIR"} = $CF{"CFBASEDIR"}; if( defined($sref->{"_intended"}) ) *************** *** 4633,4642 **** print ALERT $args{"output"}; close (ALERT); - waitpid $pid, 0; - - # - # test alerts don't count - # - return (1) if ($args{"flags"} & $FL_TEST); my $exitval = $? >> 8; --- 4537,4540 ---- *************** *** 4645,4649 **** { syslog ("err", "child alert for " . ! " $args{group}/$args{service} " . "failed, exited with $exitval"); return undef; --- 4543,4547 ---- { syslog ("err", "child alert for " . ! "$args{group}/$args{service} " . "failed, exited with $exitval"); return undef; *************** *** 4651,4654 **** --- 4549,4557 ---- # + # test alerts don't count + # + return (1) if ($args{"flags"} & $FL_TEST); + + # # tally this alert # |
From: Jim T. <tr...@us...> - 2004-08-02 19:47:40
|
Update of /cvsroot/mon/mon In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18678 Modified Files: Tag: mon-1-0-0pre1 mon mon.spec Log Message: when allow_empty_group is not set and no host arguments to pass to a monitor, the interval wasn't being reset so it would spam the syslog with lots of "no host arguments" messages. also, in reset_timer, there was a chance when seen the bug where lots of mon processes piling up for some unxplained reason? there's the explanation: -some monitor fails, and it forks a child to call the alert -the child sets up the environment then tries to exec the alert, but the exec fails for one reason or another. the child syslogs the failure, but returns instead of calling exit, so you wind up with another mon process running -it could get quite bad of the unexpected mons register failures and call more alerts, and those alerts fail, etc... this is fixed now, and some better debugging added. Index: mon.spec =================================================================== RCS file: /cvsroot/mon/mon/Attic/mon.spec,v retrieving revision 1.1.2.1 retrieving revision 1.1.2.2 diff -C2 -d -r1.1.2.1 -r1.1.2.2 *** mon.spec 12 Jul 2004 12:46:23 -0000 1.1.2.1 --- mon.spec 2 Aug 2004 19:47:31 -0000 1.1.2.2 *************** *** 12,17 **** Name: mon ! Version: 1.0.0pre3 ! Release: 1 Summary: The mon network monitoring system License: GPL --- 12,17 ---- Name: mon ! Version: 1.0.0pre4jt1 ! Release: 2 Summary: The mon network monitoring system License: GPL *************** *** 92,96 **** find %{buildroot} -name "perllocal.pod" -o -name ".packlist" -o -name "*.bs" |xargs -i rm -f {} # build filelist ! echo "%defattr(0664,root,root)" > %filelist find %{buildroot} -type f -printf "/%%P\n" | grep -v "man/man" >> %filelist --- 92,96 ---- find %{buildroot} -name "perllocal.pod" -o -name ".packlist" -o -name "*.bs" |xargs -i rm -f {} # build filelist ! echo "%defattr(-,root,root)" > %filelist find %{buildroot} -type f -printf "/%%P\n" | grep -v "man/man" >> %filelist Index: mon =================================================================== RCS file: /cvsroot/mon/mon/mon,v retrieving revision 1.4.2.9 retrieving revision 1.4.2.10 diff -C2 -d -r1.4.2.9 -r1.4.2.10 *** mon 9 Jul 2004 13:27:33 -0000 1.4.2.9 --- mon 2 Aug 2004 19:47:31 -0000 1.4.2.10 *************** *** 446,452 **** syslog ('info', "throttled at $procs processes"); } - } ! else { --- 446,451 ---- syslog ('info', "throttled at $procs processes"); } } ! else { *************** *** 798,802 **** if (!open (CFG, "m4 $CF |")); } ! else { --- 797,801 ---- if (!open (CFG, "m4 $CF |")); } ! else { *************** *** 1110,1114 **** next; } ! if ($inalias) { --- 1109,1113 ---- next; } ! if ($inalias) { *************** *** 1150,1154 **** next; } ! if ($inwatch) { --- 1149,1153 ---- next; } ! if ($inwatch) { *************** *** 1255,1259 **** $args = $2; } ! else { --- 1254,1258 ---- $args = $2; } ! else { *************** *** 1290,1294 **** push @{$pref->{"alerts"}}, $args; } ! elsif ($var eq "upalert") { --- 1289,1293 ---- push @{$pref->{"alerts"}}, $args; } ! elsif ($var eq "upalert") { *************** *** 1296,1305 **** push @{$pref->{"upalerts"}}, $args; } ! elsif ($var eq "startupalert") { push @{$pref->{"startupalerts"}}, $args; } ! elsif ($var eq "alertevery") { --- 1295,1304 ---- push @{$pref->{"upalerts"}}, $args; } ! elsif ($var eq "startupalert") { push @{$pref->{"startupalerts"}}, $args; } ! elsif ($var eq "alertevery") { *************** *** 1392,1396 **** } } ! elsif ($var eq "upalertafter") { --- 1391,1395 ---- } } ! elsif ($var eq "upalertafter") { *************** *** 1402,1406 **** $pref->{"upalertafter"} = $args; } ! elsif ($var eq "numalerts") { --- 1401,1405 ---- $pref->{"upalertafter"} = $args; } ! elsif ($var eq "numalerts") { *************** *** 1472,1476 **** } } ! elsif ($var eq "randskew") { --- 1471,1475 ---- } } ! elsif ($var eq "randskew") { *************** *** 1481,1486 **** } - - elsif ($var eq "dep_behavior") { --- 1480,1483 ---- *************** *** 1772,1776 **** configure_filehandle (*TRAPSERVER) || die_die ("err", "could not configure UDP trap port: $!"); ! return if (!$CF{"SNMP"}); --- 1769,1773 ---- configure_filehandle (*TRAPSERVER) || die_die ("err", "could not configure UDP trap port: $!"); ! return if (!$CF{"SNMP"}); *************** *** 2023,2027 **** } sock_write ($fh, "220 test monitor completed\n"); ! # # test alert --- 2020,2024 ---- } sock_write ($fh, "220 test monitor completed\n"); ! # # test alert *************** *** 2070,2074 **** sock_write ($fh, "220 test alert completed\n"); } ! # # test config file --- 2067,2071 ---- sock_write ($fh, "220 test alert completed\n"); } ! # # test config file *************** *** 2419,2423 **** unless (@listAliasesRequest == 0); sock_write ($fh, "220 list aliasgroups completed\n"); ! # # list deps --- 2416,2420 ---- unless (@listAliasesRequest == 0); sock_write ($fh, "220 list aliasgroups completed\n"); ! # # list deps *************** *** 2470,2475 **** sock_write ($fh, "520 unknown list command\n"); } - - # --- 2467,2470 ---- *************** *** 2487,2491 **** my $sref = \%{$watch{$group}->{$service}}; ! if ($sref->{"_op_status"} == $STAT_OK || $sref->{"_op_status"} == $STAT_UNTESTED) { --- 2482,2486 ---- my $sref = \%{$watch{$group}->{$service}}; ! if ($sref->{"_op_status"} == $STAT_OK || $sref->{"_op_status"} == $STAT_UNTESTED) { *************** *** 2627,2631 **** sock_write ($fh, "220 command authorized\n"); } ! else { --- 2622,2626 ---- sock_write ($fh, "220 command authorized\n"); } ! else { *************** *** 3177,3202 **** } ! if (@ghosts == 0 && !defined ($sref->{"allow_empty_group"})) { syslog ('err', "monitor for $group/$service" . " not called because of no host arguments\n"); ! } else { $fhandles{"$group/$service"} = new FileHandle; ! $pid = open($fhandles{"$group/$service"}, '-|'); ! if (!defined $pid) { syslog ('err', "Could not fork: $!"); delete $fhandles{"$group/$service"}; return 0; ! } elsif ($pid == 0) { open(STDERR, '>&STDOUT') or syslog ('err', "Could not dup stderr: $!"); open(STDIN, "</dev/null") or syslog ('err', "Could not connect stdin to /dev/null: $!"); my $v; ! foreach $v (keys %{$sref->{"ENV"}}) { $ENV{$v} = $sref->{"ENV"}->{$v}; } $ENV{"MON_LAST_SUMMARY"} = $sref->{"_last_summary"}; $ENV{"MON_LAST_OUTPUT"} = $sref->{"_last_output"}; --- 3172,3210 ---- } ! if (@ghosts == 0 && !defined ($sref->{"allow_empty_group"})) ! { syslog ('err', "monitor for $group/$service" . " not called because of no host arguments\n"); + reset_timer ($group, $service); + } ! else ! { $fhandles{"$group/$service"} = new FileHandle; ! $pid = open ($fhandles{"$group/$service"}, '-|'); ! ! if (!defined $pid) ! { syslog ('err', "Could not fork: $!"); delete $fhandles{"$group/$service"}; return 0; + } ! elsif ($pid == 0) ! { open(STDERR, '>&STDOUT') or syslog ('err', "Could not dup stderr: $!"); + open(STDIN, "</dev/null") or syslog ('err', "Could not connect stdin to /dev/null: $!"); + my $v; ! ! foreach $v (keys %{$sref->{"ENV"}}) ! { $ENV{$v} = $sref->{"ENV"}->{$v}; } + $ENV{"MON_LAST_SUMMARY"} = $sref->{"_last_summary"}; $ENV{"MON_LAST_OUTPUT"} = $sref->{"_last_output"}; *************** *** 3208,3213 **** $ENV{"MON_STATEDIR"} = $CF{"STATEDIR"}; $ENV{"MON_LOGDIR"} = $CF{"LOGDIR"}; ! exec @args or syslog ('err', "could not exec '@args': $!") ! && exit(1); } --- 3216,3225 ---- $ENV{"MON_STATEDIR"} = $CF{"STATEDIR"}; $ENV{"MON_LOGDIR"} = $CF{"LOGDIR"}; ! ! if (!exec @args) ! { ! syslog ('err', "could not exec '@args': $!"); ! exit (1); ! } } *************** *** 3263,3272 **** int(rand($sref->{"randskew"})+1)); } ! elsif ($sref->{"_next_check"}) { ! $sref->{"_timer"} = $sref->{"_next_check"} - time(); } ! else { --- 3275,3287 ---- int(rand($sref->{"randskew"})+1)); } ! elsif ($sref->{"_next_check"}) { ! if (($sref->{"_timer"} = $sref->{"_next_check"} - time()) < 0) ! { ! $sref->{"_timer"} = $sref->{"interval"}; ! } } ! else { *************** *** 3503,3507 **** return undef if (!defined $pass); ! if ((crypt ($plaintext, $pass)) ne $pass) { return undef; --- 3518,3522 ---- return undef if (!defined $pass); ! if ((crypt ($plaintext, $pass)) ne $pass) { return undef; *************** *** 3820,3824 **** $trap{$trap_name} = un_esc_str ($trap_val); } ! else { --- 3835,3839 ---- $trap{$trap_name} = un_esc_str ($trap_val); } ! else { *************** *** 3841,3845 **** $traphost = "*"; } ! else { --- 3856,3860 ---- $traphost = "*"; } ! else { *************** *** 3915,3919 **** return; } ! elsif (!defined $watch{$trap{"grp"}}->{$trap{"svc"}}) { --- 3930,3934 ---- return; } ! elsif (!defined $watch{$trap{"grp"}}->{$trap{"svc"}}) { *************** *** 4039,4043 **** } next; ! # # SNMP trap --- 4054,4058 ---- } next; ! # # SNMP trap *************** *** 4194,4198 **** # sub normalize_paths { ! my ($authtype, @authtypes); --- 4209,4213 ---- # sub normalize_paths { ! my ($authtype, @authtypes); *************** *** 4365,4369 **** ($STAT_FAIL, $STAT_OK, $STAT_COLDSTART, $STAT_WARMSTART, $STAT_LINKDOWN, $STAT_UNKNOWN, $STAT_TIMEOUT, $STAT_UNTESTED, $STAT_DEPEND, $STAT_WARN) = (0..9); ! %FAILURE = ( $STAT_FAIL => 1, --- 4380,4384 ---- ($STAT_FAIL, $STAT_OK, $STAT_COLDSTART, $STAT_WARMSTART, $STAT_LINKDOWN, $STAT_UNKNOWN, $STAT_TIMEOUT, $STAT_UNTESTED, $STAT_DEPEND, $STAT_WARN) = (0..9); ! %FAILURE = ( $STAT_FAIL => 1, *************** *** 4426,4433 **** $pref->{"_last_alert"} = 0 if ($pref->{"alertevery"}); ! $pref->{"_consec_failures"} = 0 if ($pref->{"alertafter_consec"}); ! $pref->{'_1stfailtime'} = 0 if ($pref->{"alertafterival"}); --- 4441,4448 ---- $pref->{"_last_alert"} = 0 if ($pref->{"alertevery"}); ! $pref->{"_consec_failures"} = 0 if ($pref->{"alertafter_consec"}); ! $pref->{'_1stfailtime'} = 0 if ($pref->{"alertafterival"}); *************** *** 4606,4612 **** if (!exec @execargs) { ! syslog ('err', "could not exec alert $alert: $!"); ! return undef; } exit; } --- 4621,4628 ---- if (!exec @execargs) { ! syslog ('err', "child could not exec alert $alert (execargs=" . join (",", @execargs) . "): $!"); ! exit (1); } + exit; } *************** *** 4624,4627 **** --- 4640,4653 ---- return (1) if ($args{"flags"} & $FL_TEST); + my $exitval = $? >> 8; + + if ($exitval) + { + syslog ("err", "child alert for " . + " $args{group}/$args{service} " . + "failed, exited with $exitval"); + return undef; + } + # # tally this alert *************** *** 4871,4875 **** my $msg = shift; my $ans = ""; ! $ans = $PAM_username if ($code == Authen::PAM::PAM_PROMPT_ECHO_ON() ); $ans = $PAM_password if ($code == Authen::PAM::PAM_PROMPT_ECHO_OFF() ); --- 4897,4901 ---- my $msg = shift; my $ans = ""; ! $ans = $PAM_username if ($code == Authen::PAM::PAM_PROMPT_ECHO_ON() ); $ans = $PAM_password if ($code == Authen::PAM::PAM_PROMPT_ECHO_OFF() ); |
From: Jim T. <tr...@us...> - 2004-07-29 21:10:28
|
Update of /cvsroot/mon/mon/mon.d In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28404 Modified Files: Tag: mon-1-0-0pre1 fping.monitor Log Message: fixed a quirk reported by Daniel Wallace <da...@ak...> http://bugs.gentoo.org/show_bug.cgi?id=58808 Index: fping.monitor =================================================================== RCS file: /cvsroot/mon/mon/mon.d/fping.monitor,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.2.1 diff -C2 -d -r1.1.1.1 -r1.1.1.1.2.1 *** fping.monitor 9 Jun 2004 05:18:05 -0000 1.1.1.1 --- fping.monitor 29 Jul 2004 21:10:19 -0000 1.1.1.1.2.1 *************** *** 69,73 **** { chomp; ! if (/^(\S+).*unreachable/) { push (@unreachable, $1); --- 69,73 ---- { chomp; ! if (/^(\S+).*unreachable/i) { push (@unreachable, $1); |
From: Jim T. <tr...@us...> - 2004-07-13 17:47:22
|
Update of /cvsroot/mon/mon/mon.d In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28888/mon.d Modified Files: Tag: mon-1-0-0pre1 phttp.monitor trace.monitor Log Message: this is ridiculous. there's a bug in rpm's "perldeps.pl", the thing which groks perl scripts during the rpmbuild process to extract the module dependencies. the thing identifies "use" statements which begin on a line, but it gets fooled when it finds "use somethingorother" within a "print <<EOF" statement. for example, code in a #!/usr/bin/perl file like this: print <<EOF; happy birthday, everybody. this monitor is for use with mon. EOF will result in an rpm dependency on the module "perl(with)"! the bug is fixed in a newer rpm: http://bugzilla.redhat.com/bugzilla/long_list.cgi?buglist=109934 however it's just simpler to work around the bug, so i reformatted some of the text in phttp.monitor and trace.monitor which was causing the problem. Index: phttp.monitor =================================================================== RCS file: /cvsroot/mon/mon/mon.d/phttp.monitor,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.2.1 diff -C2 -d -r1.1.1.1 -r1.1.1.1.2.1 *** phttp.monitor 9 Jun 2004 05:18:04 -0000 1.1.1.1 --- phttp.monitor 13 Jul 2004 17:47:13 -0000 1.1.1.1.2.1 *************** *** 407,411 **** CAVEAT: Do not forget to quote the string. ! use -Dopt to see what you really input. You can use \\n to mean newline. --- 407,411 ---- CAVEAT: Do not forget to quote the string. ! enable -Dopt to see what you really input. You can use \\n to mean newline. *************** *** 435,439 **** CAVEAT: Do not forget to quote the string. ! use -Dopt to see what you really input. --Dgen : print general debug information. --- 435,439 ---- CAVEAT: Do not forget to quote the string. ! enable -Dopt to see what you really input. --Dgen : print general debug information. Index: trace.monitor =================================================================== RCS file: /cvsroot/mon/mon/mon.d/trace.monitor,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.2.1 diff -C2 -d -r1.1.1.1 -r1.1.1.1.2.1 *** trace.monitor 9 Jun 2004 05:18:06 -0000 1.1.1.1 --- trace.monitor 13 Jul 2004 17:47:13 -0000 1.1.1.1.2.1 *************** *** 670,675 **** usage: trace.monitor -h trace.monitor [-L] [-s dir] [-l dir] [-d num] [-t args] [-m {m,n}] host [host...] ! traceroute to a host, compare the route paths between invocations. for ! use with "mon". -L append results to a log file --- 670,675 ---- usage: trace.monitor -h trace.monitor [-L] [-s dir] [-l dir] [-d num] [-t args] [-m {m,n}] host [host...] ! traceroute to a host, compare the route paths between invocations. for use ! with "mon". -L append results to a log file |
From: Jim T. <tr...@us...> - 2004-07-12 16:52:18
|
Update of /cvsroot/mon/mon/mon.d In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27622 Added Files: Tag: mon-1-0-0pre1 radius.monitor Log Message: adding radius.monitor --- NEW FILE: radius.monitor --- #!/usr/bin/perl # # Monitor radius processes # # Based upon radius.monitor by Brian Moore, posted to the mon mailing list # # Arguments are: # # --username=user --password=pass --secret=secret # [--port=#] [--attempts=#] [--dictionary=/path/to/dictionary] # hostname [hostname ...] # # Arguments are in standard POSIX format and can be given as the least # significant part (i.e. -p is the same as --password). # # This monitor performs a real RADIUS check, attempting to be as much like a # terminal server as possible. This requires that you include a username, # password, and secret in your mon.cf file. Depending on your unix # implementation, this may allow unscrupulous users to view the command line # arguments, including your RADIUS secret. If you prefer, you can uncomment # three lines below (see comments) to provide defaults for username, # password, and secret. # # This monitor attempts to check a username and password up to n times # (defaults to 9, but can be set via the --attempts=# command line switch). # It only registers a failure to mon after failing to receive a satisfactory # response n times. It returns an immediate failure to mon if it receives a # failed authentication. For this reason, you will need to create a dummy # user on your RADIUS server for authentication testing. # # # Copyright (C) 1998, ACC TelEnterprises # Written by James FitzGibbon <ja...@ic...> # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # use Authen::Radius; use Sys::Hostname; use Getopt::Long; GetOptions( \%options, "port=i", "secret=s", "username=s", "password=s", "attempts=i", "dictionary=s" ); $options{"port"} ||= 1645; $options{"attempts"} ||= 9; # # uncomment these three lines and replace with appropriate info if you'd prefer # not to pass sensitive information on the command line # $options{"username"} = "username"; $options{"password"} = "password"; $options{"secret"} = "radius-secret"; $options{"dictionary"} = "/etc/radius/dictionary"; Authen::Radius->load_dictionary( $options{dictionary} ); undef $diag; @failed_hosts = (); foreach $host (@ARGV) { $auth = new Authen::Radius(Host => "$host:$options{port}", Secret => $options{secret} ); $auth->add_attributes( { Name => "User-Name", Value => $options{username} }, { Name => "Password", Value => $options{password} }, { Name => "NAS-IP-Address", Value => join( ".", unpack ( "C4", (gethostbyname( hostname() ))[4] ) ) }, ); $done = 0; $attempts = 0; while( ! $done ) { $auth->send_packet( ACCESS_REQUEST ); $err = $auth->get_error(); if( $err ne "ENONE" ) { $attempts++; if( $attempts > $options{attempts} ) { push @failed_hosts, $host; push( @failures, "$host failed for user $options{username}: " . $auth->strerror( $err ) ); $done = 1; } next; } $resptype = $auth->recv_packet(); $err = $auth->get_error(); if( $err ne "ENONE" ) { $attempts++; if( $attempts > $options{attempts} ) { push @failed_hosts, $host; push( @failures, "$host failed for user $options{username}: " . $auth->strerror( $err ) ); $done = 1; } } elsif( $resptype == ACCESS_REJECT ) { push @failed_hosts, $host; push( @failures, "$host returned bad auth for user $options{username}" ); $done = 1; } else { $done = 1; } } } if (@failed_hosts) { print "@failed_hosts\n\n"; print join (", ", @failures), "\n"; exit 1; }; exit 0; |
From: Jim T. <tr...@us...> - 2004-07-12 15:50:37
|
Update of /cvsroot/mon/mon/mon.d In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv12228 Modified Files: Tag: mon-1-0-0pre1 dns.monitor Log Message: added -tcp flag, suggested by Kevin Ivory <Ivory@SerNet.de> Index: dns.monitor =================================================================== RCS file: /cvsroot/mon/mon/mon.d/dns.monitor,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.2.1 diff -C2 -d -r1.1.1.1 -r1.1.1.1.2.1 *** dns.monitor 9 Jun 2004 05:18:05 -0000 1.1.1.1 --- dns.monitor 12 Jul 2004 15:50:25 -0000 1.1.1.1.2.1 *************** *** 26,34 **** =head1 SYNOPSIS ! B<dns.monitor> I<-zone zone [-zone zone ...]> I<-master master> I<[-serial_threshold num]> I<server [server ...]> or ! B<dns.monitor> I<-caching_only> I<-query record[:type] [-query record[:type] ...]> I<server [server ...]> =head1 DESCRIPTION --- 26,34 ---- =head1 SYNOPSIS ! B<dns.monitor> I<-zone zone [-zone zone ...]> I<-master master> I<[-serial_threshold num]> I<[-tcp]> I<server [server ...]> or ! B<dns.monitor> I<-caching_only> I<[-tcp]> I<-query record[:type] [-query record[:type] ...]> I<server [server ...]> =head1 DESCRIPTION *************** *** 48,52 **** propagation, or on Dynamic DNS zones which may be updated hundreds or thousands of times an hour) It is assumed that each I<server> is ! supposed to be authoritative for the I<zone>. In caching mode, specified via the I<-caching_only> switch, B<dns.monitor> --- 48,53 ---- propagation, or on Dynamic DNS zones which may be updated hundreds or thousands of times an hour) It is assumed that each I<server> is ! supposed to be authoritative for the I<zone>. The I<-tcp> option ! will cause lookups to be done via TCP instead of the default UDP. In caching mode, specified via the I<-caching_only> switch, B<dns.monitor> *************** *** 87,97 **** my($SerialThreshold) = (0); my($CachingServer) = (0); my(%OptVars) = ("master" => \$Master, "zone" => \@Zones, "serial_threshold" => \$SerialThreshold, "caching_only" => \$CachingServer, ! "query" => \@Queries); ! if (!GetOptions(\%OptVars, "master=s", "zone=s@", "serial_threshold=s", "caching_only", "query=s@")) { print STDERR "Problems with Options, sorry\n"; exit -1; --- 88,100 ---- my($SerialThreshold) = (0); my($CachingServer) = (0); + my($UseTCP) = (0); my(%OptVars) = ("master" => \$Master, "zone" => \@Zones, "serial_threshold" => \$SerialThreshold, "caching_only" => \$CachingServer, ! "query" => \@Queries, ! "tcp" => \$UseTCP); ! if (!GetOptions(\%OptVars, "master=s", "zone=s@", "serial_threshold=s", "caching_only", "tcp", "query=s@")) { print STDERR "Problems with Options, sorry\n"; exit -1; *************** *** 221,224 **** --- 224,228 ---- # Query the $Master for the SOA of $Zone and get the serial number. $res = new Net::DNS::Resolver; + $res->usevc(1) if ($UseTCP); $res->defnames(0); # don't append default zone $res->recurse(0); # no recursion *************** *** 257,260 **** --- 261,265 ---- foreach $server (@Servers) { $res = new Net::DNS::Resolver; + $res->usevc(1) if ($UseTCP); $res->defnames(0); # don't append default zone $res->recurse(0); # no recursion *************** *** 321,324 **** --- 326,330 ---- foreach $server (@Servers) { $res = new Net::DNS::Resolver; + $res->usevc(1) if ($UseTCP); $res->defnames(0); # don't append default zone $res->recurse(0); # no recursion |
From: Jim T. <tr...@us...> - 2004-07-12 15:17:01
|
Update of /cvsroot/mon/mon/cgi-bin In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv4776/cgi-bin Removed Files: Tag: mon-1-0-0pre1 README Log Message: no longer needed --- README DELETED --- |
From: Jim T. <tr...@us...> - 2004-07-12 15:14:09
|
Update of /cvsroot/mon/mon/clients In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv4213 Modified Files: Tag: mon-1-0-0pre1 mon.cgi Log Message: Date: Mon, 12 Jul 2004 10:39:43 -0400 From: Ed Ravin <er...@pa...> here's a patch that seems to have been overlooked - it adds a "show failures only" button to the menu. If your Mon web server is a little pokey, it really speeds things up, since it takes a while to redraw that "sea of green" and my support staff were having trouble using Mon over slow links. I'd like to update it one day to also show untested services, since when Mon is restarting the "failures only" display is a bit misleading since there's plenty of stuff that might fail once the test is run. I believe the patch below will apply to mon.cgi 1.52. Let me know if there are any problems. We've been using it here for a year or two. Index: mon.cgi =================================================================== RCS file: /cvsroot/mon/mon/clients/Attic/mon.cgi,v retrieving revision 1.1.2.1 retrieving revision 1.1.2.2 diff -C2 -d -r1.1.2.1 -r1.1.2.2 *** mon.cgi 12 Jul 2004 01:33:32 -0000 1.1.2.1 --- mon.cgi 12 Jul 2004 15:13:57 -0000 1.1.2.2 *************** *** 706,711 **** # Set the refresh page to always be the summary page, unless # certain commands are selected. ! if ( $command eq "query_opstatus_full" ) { ! $refresh_url = "$url?${monhost_and_port_args_meta}command=query_opstatus_full"; } elsif ( ($command eq "mon_test_service") || ($command eq "svc_details") ) { $refresh_url = "$url?${monhost_and_port_args_meta}command=svc_details&args=$args"; --- 706,711 ---- # Set the refresh page to always be the summary page, unless # certain commands are selected. ! if ( $command =~ "^query_opstatus_" ) { ! $refresh_url = "$url?${monhost_and_port_args_meta}command=$command"; } elsif ( ($command eq "mon_test_service") || ($command eq "svc_details") ) { $refresh_url = "$url?${monhost_and_port_args_meta}command=svc_details&args=$args"; *************** *** 877,881 **** # $webpage->print("<tr>\n"); ! $webpage->print("\t<td $auth_commands{'list'}{'bgcolor'} align=center><font FACE=\"$face\"><a href=\"$url?${monhost_and_port_args}command=query_opstatus\">Show Operational Status (summary)</a></font></td>\n"); $webpage->print("\t<td $auth_commands{'list'}{'bgcolor'} align=center><font FACE=\"$face\"><a href=\"$url?${monhost_and_port_args}command=list_alerthist\">Show Alert History</a></font></td>\n"); $webpage->print("\t<td $auth_commands{'loadstate'}{'bgcolor'} align=center><font FACE=\"$face\"><a href=\"$url?${monhost_and_port_args}command=mon_loadstate&args=\">Load scheduler state</a></font></td>\n"); --- 877,881 ---- # $webpage->print("<tr>\n"); ! $webpage->print("\t<td $auth_commands{'list'}{'bgcolor'} align=center><font FACE=\"$face\"><a href=\"$url?${monhost_and_port_args}command=query_opstatus\">Show Operational Status (summary)</a> <a href=\"$url?${monhost_and_port_args}command=query_opstatus_failures\">(failures only)</a></font></td>\n"); $webpage->print("\t<td $auth_commands{'list'}{'bgcolor'} align=center><font FACE=\"$face\"><a href=\"$url?${monhost_and_port_args}command=list_alerthist\">Show Alert History</a></font></td>\n"); $webpage->print("\t<td $auth_commands{'loadstate'}{'bgcolor'} align=center><font FACE=\"$face\"><a href=\"$url?${monhost_and_port_args}command=mon_loadstate&args=\">Load scheduler state</a></font></td>\n"); *************** *** 965,969 **** &list_status($detail_level, %op_failure) if defined(%op_failure); ! &list_status($detail_level, %op_success) if defined(%op_success); $webpage->print("</table>\n"); --- 965,978 ---- &list_status($detail_level, %op_failure) if defined(%op_failure); ! ! if ($detail_level eq "failures") ! { ! $webpage->print ! ("<tr><td bgcolor=$greenlight_color colspan=4><center><font size=+2>No failures found.</font></center></td></tr>\n") ! unless %op_failure; ! } else ! { ! &list_status($detail_level, %op_success) if defined(%op_success); ! } $webpage->print("</table>\n"); *************** *** 3782,3785 **** --- 3791,3799 ---- &setup_page("Operation Status: Full View"); &query_opstatus("full"); + } + elsif ($command eq "query_opstatus_failures") + { + &setup_page("Operation Status: Failures Only"); + &query_opstatus("failures"); # Selection "mon_opstatus" will fall through to else. } |
From: Jim T. <tr...@us...> - 2004-07-12 13:17:52
|
Update of /cvsroot/mon/mon/doc In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv13985/doc Modified Files: Tag: mon-1-0-0pre1 README.cgi-bin Log Message: mon.cgi is now back in the main distro Index: README.cgi-bin =================================================================== RCS file: /cvsroot/mon/mon/doc/README.cgi-bin,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.2.1 diff -C2 -d -r1.1.1.1 -r1.1.1.1.2.1 *** README.cgi-bin 9 Jun 2004 05:18:06 -0000 1.1.1.1 --- README.cgi-bin 12 Jul 2004 13:17:40 -0000 1.1.1.1.2.1 *************** *** 3,15 **** mon.cgi ------- ! mon.cgi used to be a part of the mon distribution, but it is now ! maintained by Andrew Ryan <an...@na...>. The latest ! release can be found at ! ! http://www.nam-shub.com/files/ ! ! or ! ! ftp://ftp.kernel.org/pub/software/admin/mon/contrib/ minotaur --- 3,8 ---- mon.cgi ------- ! mon.cgi is the more advanced web interface to mon, maintained by ! Andrew Ryan <an...@na...>. minotaur |