From: <est...@us...> - 2013-05-06 00:38:45
|
Revision: 2752 http://nagios.svn.sourceforge.net/nagios/?rev=2752&view=rev Author: estanley375 Date: 2013-05-06 00:38:36 +0000 (Mon, 06 May 2013) Log Message: ----------- Fixed bug #445: Adding triggered downtime for child hosts causes a SIGSEGV on restart/reload This was caused by triggered downtimes being deleted the when the triggering downtime was restarted. It was deleted because it was still marked as in effect. It is now marked as not in effect in the register_downtime() function. A related issue, also resolved, is that after a restart, the triggered downtime was dropped. The same issue also caused the CGI not to list the triggered downtime. This was due to the ordering of the downtimes in the retention.dat and status.dat files. Previously the triggered downtime always appeared before its triggering downtime in those files. When the downtimes were read from those files, either on a core restart or by the CGIs, the triggered downtime would be discarded because the triggering downtime did not yet exist. The most common case for this is when a downtime is created and the option is selected to create triggered downtimes on all child objects. A change was made in the way downtimes are sorted so that triggered downtimes with the same start times as untriggered downtimes always appear later in the list. This change in the sort order does NOT resolve the case where a manually created, triggered downtime is created with a start time earlier than the triggering downtime. This would need to be resolved by comparing the triggered_by value with the downtime ID regardless of the start time. However, this should be a relatively rare case and only caused by intentional scheduling by a human. This change was not implemented because it would cause the downtime list to be out of time order and the implications of this were not well understood. Modified Paths: -------------- nagioscore/branches/nagios-3-4-x/Changelog nagioscore/branches/nagios-3-4-x/common/downtime.c Modified: nagioscore/branches/nagios-3-4-x/Changelog =================================================================== --- nagioscore/branches/nagios-3-4-x/Changelog 2013-04-30 07:45:27 UTC (rev 2751) +++ nagioscore/branches/nagios-3-4-x/Changelog 2013-05-06 00:38:36 UTC (rev 2752) @@ -4,6 +4,7 @@ 3.5.1 - xx/xx/xxxx ------------------ +* Fixed bug #445: Adding triggered downtime for child hosts causes a SIGSEGV on restart/reload (Eric Stanley) * Fixed bug #375: Freshness expiration never reached and bug #427: freshness threshold doesn't work if it is set long (Scott Wilkerson, Eric Stanley) * Fixed bug #432: Downtime scheduled as "Nagios Process" and not the Users name (Sam Lansing, Eric Stanley) Modified: nagioscore/branches/nagios-3-4-x/common/downtime.c =================================================================== --- nagioscore/branches/nagios-3-4-x/common/downtime.c 2013-04-30 07:45:27 UTC (rev 2751) +++ nagioscore/branches/nagios-3-4-x/common/downtime.c 2013-05-06 00:38:36 UTC (rev 2752) @@ -360,6 +360,14 @@ } } + /* If the downtime is triggered and was in effect, mark it as not in + effect so it gets scheduled correctly */ + if((temp_downtime->triggered_by != 0) && + (TRUE == temp_downtime->is_in_effect)) { + was_in_effect = temp_downtime->is_in_effect; + temp_downtime->is_in_effect = FALSE; + } + if((FALSE == temp_downtime->fixed) && (FALSE == was_in_effect)) { /* increment pending flex downtime counter */ if(temp_downtime->type == HOST_DOWNTIME) @@ -1111,6 +1119,39 @@ static int downtime_compar(const void *p1, const void *p2) { scheduled_downtime *d1 = *(scheduled_downtime **)p1; scheduled_downtime *d2 = *(scheduled_downtime **)p2; + + /* + If the start times of two downtimes are equal and one is triggered but + but the other is not, the triggered downtime should be later in the + list than the untriggered one. This is so they are written to the + retention.dat and status.dat in the correct order. + + Previously the triggered downtime always appeared before its + triggering downtime in those files. When the downtimes were read + from those files, either on a core restart or by the CGIs, the + triggered downtime would be discarded because the triggering + downtime did not yet exist. + + The most common case for this is when a downtime is created and + the option is selected to create triggered downtimes on all child + objects. This change in the sort order does NOT resolve the + case where a manually created, triggered downtime is created with + a start time earlier than the triggering downtime. + + This would need to be resolved by comparing the triggered_by value + with the downtime ID regardless of the start time. However, this + should be a relatively rare case and only caused by intentional + scheduling by a human. This change was not implemented because it + would cause the downtime list to be out of time order and the + implications of this were not well understood. + */ + + if(d1->start_time == d2->start_time) { + if(( d1->triggered_by == 0 && d2->triggered_by != 0) || + ( d1->triggered_by != 0 && d2->triggered_by == 0)) { + return d1->triggered_by == 0 ? -1 : 1; + } + } return (d1->start_time < d2->start_time) ? -1 : (d1->start_time - d2->start_time); } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |