[Nagios-checkins] SF.net SVN: nagios:[2752] nagioscore/branches/nagios-3-4-x

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Revision: 2752
          http://nagios.svn.sourceforge.net/nagios/?rev=2752&view=rev
Author:   estanley375
Date:     2013-05-06 00:38:36 +0000 (Mon, 06 May 2013)
Log Message:
-----------
Fixed bug #445: Adding triggered downtime for child hosts causes a SIGSEGV on restart/reload

This was caused by triggered downtimes being deleted the when the
triggering downtime was restarted. It was deleted because it was
still marked as in effect. It is now marked as not in effect in the
register_downtime() function.

A related issue, also resolved, is that after a restart, the
triggered downtime was dropped. The same issue also caused the CGI
not to list the triggered downtime. This was due to the ordering
of the downtimes in the retention.dat and status.dat files.

Previously the triggered downtime always appeared before its
triggering downtime in those files. When the downtimes were read
from those files, either on a core restart or by the CGIs, the
triggered downtime would be discarded because the triggering
downtime did not yet exist.

The most common case for this is when a downtime is created and
the option is selected to create triggered downtimes on all child
objects. A change was made in the way downtimes are sorted so that
triggered downtimes with the same start times as untriggered 
downtimes always appear later in the list.  This change in the 
sort order does NOT resolve the case where a manually created, 
triggered downtime is created with a start time earlier than the 
triggering downtime.

This would need to be resolved by comparing the triggered_by value
with the downtime ID regardless of the start time. However, this
should be a relatively rare case and only caused by intentional
scheduling by a human. This change was not implemented because it
would cause the downtime list to be out of time order and the
implications of this were not well understood.

Modified Paths:
--------------
    nagioscore/branches/nagios-3-4-x/Changelog
    nagioscore/branches/nagios-3-4-x/common/downtime.c

Modified: nagioscore/branches/nagios-3-4-x/Changelog
===================================================================

--- nagioscore/branches/nagios-3-4-x/Changelog	2013-04-30 07:45:27 UTC (rev 2751)
+++ nagioscore/branches/nagios-3-4-x/Changelog	2013-05-06 00:38:36 UTC (rev 2752)
@@ -4,6 +4,7 @@
 
 3.5.1 - xx/xx/xxxx
 ------------------
+* Fixed bug #445: Adding triggered downtime for child hosts causes a SIGSEGV on restart/reload (Eric Stanley)
 * Fixed bug #375: Freshness expiration never reached and bug #427: freshness threshold doesn't work if it is set long (Scott Wilkerson, Eric Stanley)
 * Fixed bug #432: Downtime scheduled as "Nagios Process" and not the Users name (Sam Lansing, Eric Stanley)
 

Modified: nagioscore/branches/nagios-3-4-x/common/downtime.c
===================================================================
--- nagioscore/branches/nagios-3-4-x/common/downtime.c	2013-04-30 07:45:27 UTC (rev 2751)
+++ nagioscore/branches/nagios-3-4-x/common/downtime.c	2013-05-06 00:38:36 UTC (rev 2752)
@@ -360,6 +360,14 @@
 			}
 		}
 
+	/* If the downtime is triggered and was in effect, mark it as not in 
+		effect so it gets scheduled correctly */
+	if((temp_downtime->triggered_by != 0) && 
+			(TRUE == temp_downtime->is_in_effect)) {
+		was_in_effect = temp_downtime->is_in_effect;
+		temp_downtime->is_in_effect = FALSE;
+		}
+
 	if((FALSE == temp_downtime->fixed) && (FALSE == was_in_effect)) {
 		/* increment pending flex downtime counter */
 		if(temp_downtime->type == HOST_DOWNTIME)
@@ -1111,6 +1119,39 @@
 static int downtime_compar(const void *p1, const void *p2) {
 	scheduled_downtime *d1 = *(scheduled_downtime **)p1;
 	scheduled_downtime *d2 = *(scheduled_downtime **)p2;
+
+	/*
+ 		If the start times of two downtimes are equal and one is triggered but
+		but the other is not, the triggered downtime should be later in the
+		list than the untriggered one. This is so they are written to the
+		retention.dat and status.dat in the correct order.
+
+		Previously the triggered downtime always appeared before its 
+		triggering downtime in those files. When the downtimes were read 
+		from those files, either on a core restart or by the CGIs, the 
+		triggered downtime would be discarded because the triggering 
+		downtime did not yet exist.
+
+		The most common case for this is when a downtime is created and 
+		the option is selected to create triggered downtimes on all child 
+		objects. This change in the sort order does NOT resolve the 
+		case where a manually created, triggered downtime is created with 
+		a start time earlier than the triggering downtime.
+
+		This would need to be resolved by comparing the triggered_by value
+		with the downtime ID regardless of the start time. However, this
+		should be a relatively rare case and only caused by intentional
+		scheduling by a human. This change was not implemented because it
+		would cause the downtime list to be out of time order and the
+		implications of this were not well understood.
+	*/
+
+	if(d1->start_time == d2->start_time) {
+		if(( d1->triggered_by == 0 && d2->triggered_by != 0) ||
+				( d1->triggered_by != 0 && d2->triggered_by == 0)) {
+			return d1->triggered_by == 0 ? -1 : 1;
+			}
+		}
 	return (d1->start_time < d2->start_time) ? -1 : (d1->start_time - d2->start_time);
 	}
 

This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.





[Nagios-checkins] SF.net SVN: nagios:[2752] nagioscore/branches/nagios-3-4-x

Nagios network monitoring software is enterprise server monitoring

[Nagios-checkins] SF.net SVN: nagios:[2752] nagioscore/branches/nagios-3-4-x