From: Sean M. <sm...@co...> - 2008-07-31 15:12:08
|
Michael Weiner wrote: > HMMMMMMM now you've peaked my interest. Anything you can share before > i start building? I like the idea and wouldnt mind implementing a > similar solution > > Michael I just spent a while trying to come up with a comprehensive quick explanation, but it's just not possible. The internal documentation for the system design is something like 15+ pages, the majority of which contains data that needs to be thoroughly sanitized. This is the meat of it though and should give you an idea of what things need to be considered. Feel free to ask questions about how I solved specific problems or suggest ways to improve it. Here is the repo layout: | root -- All non-object configs (nagios.cfg, cgi.cfg, resources.cfg, nsca.cfg, etc...) |-- config -- All object configs (notify_cmds.cfg, templates, etc..). | `-- contacts |-- htpasswd |-- scripts -- All shell scripts (event handlers, self-promotion, etc...) | `-- checks -- Custom checks not found in the FreeBSD nagios-plugin port `-- targets -- All hosts and services |-- exemptions -- See step 2.1 below - removes some "global" checks from individual facilities | |-- facil0 | |-- facil1 | `-- facil2 |-- global -- Checks to be run from ALL facilities |-- facil0 -- Checks for the slave instance at facilx |-- facil1 `-- facil2 In order to comply with the automation requirements, a handful of DNS entries had to be created at each slave facility: nagios-host.[facil].example.com This is a CNAME to the slave instance at each facility. It is used as the destination target for rsyncing configs. nagios-master.[facil].example.com This is a CNAME to the master server. Due to the distributed nature of our setup and Nagios' use of hostnames as unique identifiers, this was required to give each slave server a unique target to monitor for self-promotion purposes. The svn post-commit script does the following from the master instance: 1. Checks out the newest version of the repo to /var/tmp/nagios/staging 2. Creates directories at /var/tmp/nagios/[master|facil0|facil1|facil2] and rsyncs the repo into each one 1. During this step, '.svn' is excluded and the -f option is used to specify exclusions for slaves: rsync -avz --delete-before --exclude=.svn --exclude-from=$STAGING_DIR/targets/exemptions/${this_facil} $STAGING_DIR/ ./${this_facil} 3. Moves nagios-hq.cfg or nagios-slave.cfg (as appropriate) to nagios.cfg 4. Uses grep & sed to perform search & replace on "magic" words: * FACIL_PLACEHOLDER: maximizes portability and automation of configs (examples: nagios-slave.cfg references cfg_dir=targets/FACIL_PLACEHOLDER to eliminate need for hand-manipulation; the "from" address in email is set to nagios_FACIL_PLACEHOLDER@; check_snmpagent!FACIL_PLACEHOLDER_[common suffix]; etc...); * FACIL_ROLE: dynamically adjusts service_templates.cfg; necessary to get the master instance to schedule active checks on ONLY his local checks (Nagios slaves, its own nsca daemon, its gsm modem); sets 0 for master, 1 on slaves 5. Slaves without GSM capabilities only - [host|service]_notification_commands=notify-[host|service]-by-sms to notify-[host|service]-by-epager. 6. Performs a local Nagios config validation for each facility (nagios -v /var/tmp/nagios/{facil}) 7. Rsyncs /var/tmp/nagios/{facil} to nagios-host.[facil].example.com:/usr/local/etc/nagios/ 1. $RSYNC -avz --delete-before $STAGING_ROOT/$this_facil/ nagios-host.[facil].example.com:/usr/local/etc/nagios/ 8. Peforms a remote Nagios config validation on each system 9. Reloads Nagios via the rc script on each server Self-promoption is done via an event handler script that echos ENABLE_NOTIFICATIONS, STOP_OBSESSING_OVER_HOST_CHECKS, STOP_OBSESSING_OVER_SVC_CHECK into the external command file should it lose contact with the Master instance. Self-demotion is simply the inverse of that. Sean McAfee System Engineer Collaborative Fusion, Inc. sm...@co... 412-422-3463 x 4025 5849 Forbes Avenue Pittsburgh, PA 15217 **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. **************************************************************** |