Thread: [Nagios-db-devel] neb startup revisited
Status: Beta
Brought to you by:
bench23
From: Matthew K. <mk...@ma...> - 2005-01-27 06:04:35
|
Going over the changes for pending hosts and I've noticed a few things in the postgres module. In restart.sql (during nagios startup) I'm not quite clear on why an existing host and service should be reset back to a pending state by setting has_been_checked = FALSE. If you look at nagios and the standard cgis during startup, the data is read in from the retention file and the previous states for hosts/services is assumed to be correct. Also in restart.sql, I'm not sure about inserting the empty host/service when one isn't found in the database. For example if you clear out the database and start nagios up the tac display will show X hosts with flap detection/notifications etc disabled which will slowly count backwards as all the checks complete. Kinda funky :) As I see it the solution for both issues would be to - set configured = false for all hosts/services - do the 'select into thisHostID id FROM host WHERE name = hostName;' if the host/service is NOT found, send the object to processStatus like /* update this host */ nebstruct_host_status_data ds; ds.object_ptr=(void *)hl; processStatus(NEBCALLBACK_HOST_STATUS_DATA, (void*)&ds); which will set configured = TRUE, update the has_been_checked field etc. else we set host/service configured = true and assume the rest of the data in the db is correct and leave it alone (save the extra resources of running the stored proc) Come to think of it at this point you could actually delete from host,service where configured = false to prune any hosts that have been removed from the config. This should give a more immediate overview of nagios's status right after startup. Apologies for the long email. -- Matthew Kent <mk...@ma...> http://magoazul.com |
From: Ben <be...@si...> - 2005-01-27 17:55:25
|
On Wed, 26 Jan 2005, Matthew Kent wrote: > In restart.sql (during nagios startup) I'm not quite clear on why an > existing host and service should be reset back to a pending state by > setting has_been_checked = FALSE. If you look at nagios and the standard > cgis during startup, the data is read in from the retention file and the > previous states for hosts/services is assumed to be correct. Well, technically, the check *is* pending. I understand that the retention file says the host or service has a certain state, but that hasn't been verified. If nagios was simply restarted, then the retention data is likely accurate. But after a restart, it's entirely possible that nagios might have been down for days, and in that case the retention data is much more questionable. Frankly, I think it makes a lot more sense to label everything as pending until it's been checked. Weren't you the one that convinced me of that? :) > Also in restart.sql, I'm not sure about inserting the empty host/service > when one isn't found in the database. For example if you clear out the > database and start nagios up the tac display will show X hosts with flap > detection/notifications etc disabled which will slowly count backwards > as all the checks complete. Kinda funky :) Yeah, that's a serious hack. However, I'm not sure how else to record services for a new host that has yet to be checked, because if there isn't a placeholder record, then the service cannot be entered into the database. Perhaps I should set the host options to null, or some other "unknown" state. > As I see it the solution for both issues would be to > > - set configured = false for all hosts/services > - do the 'select into thisHostID id FROM host WHERE name = hostName;' > > if > the host/service is NOT found, send the object to processStatus like > > /* update this host */ > nebstruct_host_status_data ds; > ds.object_ptr=(void *)hl; > > processStatus(NEBCALLBACK_HOST_STATUS_DATA, (void*)&ds); > > which will set configured = TRUE, update the has_been_checked field etc. > else > we set host/service configured = true and assume the rest of the data in > the db is correct and leave it alone (save the extra resources of > running the stored proc) I think a better idea would be to change configure_host() and configure_service() to take in all the data we have on the host/service before it gets checked, so that we can make our placeholder records more accurate. > Come to think of it at this point you could actually > delete from host,service where configured = false > to prune any hosts that have been removed from the config. I can't support deleting unconfigured hosts, because one of the requirements my company has is to be able report on historical availablity, even if the host isn't used anymore. > This should give a more immediate overview of nagios's status right > after startup. Like I said, I think showing most things in a pending state shows the most accurate status. Well, actually, I suppose marking things as "Pending (assumed up)" or "Pending (assumed down)" and such would be the most accurate, but that could get messy. |
From: Matthew K. <mk...@ma...> - 2005-01-27 21:19:47
|
On Thu, 2005-27-01 at 09:55 -0800, Ben wrote: > On Wed, 26 Jan 2005, Matthew Kent wrote: > > > In restart.sql (during nagios startup) I'm not quite clear on why an > > existing host and service should be reset back to a pending state by > > setting has_been_checked = FALSE. If you look at nagios and the standard > > cgis during startup, the data is read in from the retention file and the > > previous states for hosts/services is assumed to be correct. > > Well, technically, the check *is* pending. I understand that the retention > file says the host or service has a certain state, but that hasn't been > verified. If nagios was simply restarted, then the retention data is > likely accurate. But after a restart, it's entirely possible that nagios > might have been down for days, and in that case the retention data is much > more questionable. > > Frankly, I think it makes a lot more sense to label everything as pending > until it's been checked. Weren't you the one that convinced me of that? :) > My apologies, I wasn't thinking clearly as to what the effect would be on the tac display. The current implementation is the most accurate. > > Also in restart.sql, I'm not sure about inserting the empty host/service > > when one isn't found in the database. For example if you clear out the > > database and start nagios up the tac display will show X hosts with flap > > detection/notifications etc disabled which will slowly count backwards > > as all the checks complete. Kinda funky :) > > Yeah, that's a serious hack. However, I'm not sure how else to record > services for a new host that has yet to be checked, because if there isn't > a placeholder record, then the service cannot be entered into the > database. Perhaps I should set the host options to null, or some other > "unknown" state. > > > As I see it the solution for both issues would be to > > > > - set configured = false for all hosts/services > > - do the 'select into thisHostID id FROM host WHERE name = hostName;' > > > > if > > the host/service is NOT found, send the object to processStatus like > > > > /* update this host */ > > nebstruct_host_status_data ds; > > ds.object_ptr=(void *)hl; > > > > processStatus(NEBCALLBACK_HOST_STATUS_DATA, (void*)&ds); > > > > which will set configured = TRUE, update the has_been_checked field etc. > > else > > we set host/service configured = true and assume the rest of the data in > > the db is correct and leave it alone (save the extra resources of > > running the stored proc) > > I think a better idea would be to change configure_host() and > configure_service() to take in all the data we have on the host/service > before it gets checked, so that we can make our placeholder records more > accurate. > Sounds good. Is passing everything to configure_host/configure_service instead of just throwing it at processStatus to save processing time or just a logical seperation? > > Come to think of it at this point you could actually > > delete from host,service where configured = false > > to prune any hosts that have been removed from the config. > > I can't support deleting unconfigured hosts, because one of the > requirements my company has is to be able report on historical > availablity, even if the host isn't used anymore. > Was thinking about that too, if you removed a host (and maybe went to add it back later) you might be annoyed to find all the history had disappeared. I'll remove this from the mysql module and put a note about adding a db_cleanup.php down the line so users can do it themselves. > > This should give a more immediate overview of nagios's status right > > after startup. > > Like I said, I think showing most things in a pending state shows the most > accurate status. Well, actually, I suppose marking things as "Pending > (assumed up)" or "Pending (assumed down)" and such would be the most > accurate, but that could get messy. > Yeah, hardly worth the effort. Oh and did you get that other email about use of current_notification_number (it being defined in the schema but not referenced by the stored procs)? I'm not getting anything from the mailing list today. Thanks, -- Matthew Kent \ SA \ bravenet.com |
From: Ben <be...@si...> - 2005-01-27 21:46:13
|
On Thu, 27 Jan 2005, Matthew Kent wrote: > > Frankly, I think it makes a lot more sense to label everything as pending > > until it's been checked. Weren't you the one that convinced me of that? :) > > > > My apologies, I wasn't thinking clearly as to what the effect would be > on the tac display. The current implementation is the most accurate. Eh, no worries. I'm just glad somebody agrees with me. :) > > I think a better idea would be to change configure_host() and > > configure_service() to take in all the data we have on the host/service > > before it gets checked, so that we can make our placeholder records more > > accurate. > > > > Sounds good. Is passing everything to configure_host/configure_service > instead of just throwing it at processStatus to save processing time or > just a logical seperation? Logical seperation. Basically, processStatus expects to see a nebstruct_host_status_data, which is currently just a wrapper for a host struct, but in the future may have additional data in it. Or maybe data in the host struct will be moved out of the host struct and into the message.... it would make sense to me for that to happen. Anyway, I think the seperation makes sense. It keeps is clearer, in my head at least. > Was thinking about that too, if you removed a host (and maybe went to > add it back later) you might be annoyed to find all the history had > disappeared. I'll remove this from the mysql module and put a note about > adding a db_cleanup.php down the line so users can do it themselves. Yeah, it wouldn't be a bad idea to give a way to remove historical data for unconfigured hosts and services. I expect it would fit nicely into the UI you're making. :) |