From: Scott G. <sco...@nw...> - 2008-04-30 21:07:13
|
Sorry for the long subject and post. We're running 2.10 on CentOS 5. When we acknowledge a service alert that goes into warning, we're not receiving an alert when it goes into critical. For example: we're monitoring the E drive on a file server. The drive goes into a warning state, Nagios sends an alert, and an acknowledgement is entered. Later the drive goes critical, but an alert is never sent. Following are the relevant log entries and config files. Thanks for the help! Log File: E drive goes into warning Apr 29 15:10:38 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk Usage E Drive;WARNING;notify-by-epager;e:\ - total: 263.99 Gb - used: 243.89 Gb (92%) - free 20.10 Gb (8%) E drive is acknowledged Apr 29 15:11:26 DataCenterMon nagios: EXTERNAL COMMAND: ACKNOWLEDGE_SVC_PROBLEM;X;Disk Usage E Drive;2;1;1;Nagios Admin;jf Acknowledge is sent Apr 29 15:11:26 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk Usage E Drive;ACKNOWLEDGEMENT (WARNING);notify-by-email;e:\ - total: 263.99 Gb - used: 243.89 Gb (92%) - free 20.10 Gb (8%);Nagios Admin;jf Apr 29 15:11:27 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk Usage E Drive;ACKNOWLEDGEMENT (WARNING);notify-by-epager;e:\ - total: 263.99 Gb - used: 243.89 Gb (92%) - free 20.10 Gb (8%);Nagios Admin;jf E drive goes critical no alert sent Apr 30 10:07:16 DataCenterMon nagios: SERVICE ALERT: X;Disk Usage E Drive;CRITICAL;HARD;3;e:\ - total: 263.99 Gb - used: 251.33 Gb (95%) - free 12.67 Gb (5%) Apr 30 11:04:16 DataCenterMon nagios: EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;X;Disk Usage E Drive;1209578654 Acknowledgement is removed and alert is sent. Apr 30 11:05:19 DataCenterMon nagios: EXTERNAL COMMAND: REMOVE_SVC_ACKNOWLEDGEMENT;X;Disk Usage E Drive Apr 30 11:05:49 DataCenterMon nagios: EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;X;Disk Usage E Drive;1209578747 Apr 30 11:05:57 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk Usage E Drive;CRITICAL;notify-by-email;e:\ - total: 263.99 Gb - used: 254.71 Gb (96%) - free 9.29 Gb (4%) # Host Template for Critical Hosts -- [E]Pager and Email Notification to x 27x7 define host{ name Critical_Host ; The name of this host template - referenced in other host definitions, used for template recursion/resolution notifications_enabled 1 ; Host notifications are enabled event_handler_enabled 1 ; Host event handler is enabled flap_detection_enabled 1 ; Flap detection is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts notification_period 24x7 ; Notifies 24x365 notification_options d,u,r ;Down, Up, Recovery notification_interval 5 ;Sends Page/Email every 5 minutes check_command check_ping!1000.0,20%!30000.0,100% ;Warns at 20% packet loss or round trip time > 1000 MS Critical at 100% packet loss or 30000 MS roun trip max_check_attempts 5 ;Checks host 5 times before generating an alert contact_groups x register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE! } # 'NWWEBNAS' host definition define host{ use Critical_Host ; Name of host template to use host_name X alias Production File Server address x.x.x.x parents X } # Crtitical Service definition template define service{ name Critical_Service ; The 'name' of this service template, referenced in other service definitions active_checks_enabled 1 ; Active service checks are enabled passive_checks_enabled 1 ; Passive service checks are enabled/accepted parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems) obsess_over_service 1 ; We should obsess over this service (if necessary) is_volatile 0 check_freshness 0 ; Default is to NOT check service 'freshness' notifications_enabled 1 ; Service notifications are enabled event_handler_enabled 1 ; Service event handler is enabled flap_detection_enabled 1 ; Flap detection is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts event_handler_enabled 1 ;Event handler is enabled check_period 24x7_With_Maintenance_Window ;Checks 24x7x365 normal_check_interval 10 ;When service is OK it will be checked every 10 minutes max_check_attempts 3 ;When service is not OK it will check 3 times before sending an alert retry_check_interval 1 ;Retries every 1 minute once service is not OK. After max_check_attempts has bee reached it rechecks at normal_check_interval notification_interval 10 ;Sends notifications every 10 minutes notification_period 24x7 ; Notifies 24x365 notification_options w,u,c,r ;Sends alerts at Warning, Unreachable, Critical and Recovery contact_groups x ;Email ISOpsOnCall and pages ISOnCallCell register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE! } # Service definition define service{ use Critical_Service ; Name of service template to use host_name X service_description Disk Usage E Drive check_command check_nt_disk!e!80!95 } Log File: E drive goes into warning Apr 29 15:10:38 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk Usage E Drive;WARNING;notify-by-epager;e:\ - total: 263.99 Gb - used: 243.89 Gb (92%) - free 20.10 Gb (8%) E drive is acknowledged Apr 29 15:11:26 DataCenterMon nagios: EXTERNAL COMMAND: ACKNOWLEDGE_SVC_PROBLEM;X;Disk Usage E Drive;2;1;1;Nagios Admin;jf Acknowledge is sent Apr 29 15:11:26 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk Usage E Drive;ACKNOWLEDGEMENT (WARNING);notify-by-email;e:\ - total: 263.99 Gb - used: 243.89 Gb (92%) - free 20.10 Gb (8%);Nagios Admin;jf Apr 29 15:11:27 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk Usage E Drive;ACKNOWLEDGEMENT (WARNING);notify-by-epager;e:\ - total: 263.99 Gb - used: 243.89 Gb (92%) - free 20.10 Gb (8%);Nagios Admin;jf E drive goes critical no alert sent Apr 30 10:07:16 DataCenterMon nagios: SERVICE ALERT: X;Disk Usage E Drive;CRITICAL;HARD;3;e:\ - total: 263.99 Gb - used: 251.33 Gb (95%) - free 12.67 Gb (5%) Apr 30 11:04:16 DataCenterMon nagios: EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;X;Disk Usage E Drive;1209578654 Acknowledgement is removed and alert is sent. Apr 30 11:05:19 DataCenterMon nagios: EXTERNAL COMMAND: REMOVE_SVC_ACKNOWLEDGEMENT;X;Disk Usage E Drive Apr 30 11:05:49 DataCenterMon nagios: EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;X;Disk Usage E Drive;1209578747 Apr 30 11:05:57 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk Usage E Drive;CRITICAL;notify-by-email;e:\ - total: 263.99 Gb - used: 254.71 Gb (96%) - free 9.29 Gb (4%) |