From: Michael F. <mic...@un...> - 2009-09-18 11:16:33
|
Hi there, Frassinelli, Marco wrote the following on 18.09.2009 12:25: > Hi, > I think is not a correct behavior because visualization software as > nagvis and others uses the table programstatus to check if nagios is > running. The process of starting ndo2db and then Nagios makes sure that there is actual data within the DB. If there is an outdated data within the DB it needs to be removed before Nagios even sends new data. So the process of trimming those table entries is truly intentional at the beginning (so-called pre-launch state where the if condition matches). If ndo2db fails for some reason, those data will remain within the database and then removed during the next start. > > I saw that often this table is empty. Depending on your startup routine I would guess that you started Nagios first and then ndo2db. But it shouldn't because ndomod as an event broker keeps data not written to ndo2db in a defined cache. Depending on your configuration this cache may be to little so the oldest entry could be lost (in this case the programstatus of Nagios). But that's really a guess you'll have to give more information where and when this error occurs mentioning all circumstances you'll catch up in the logs (turn on very detailed and everything in debug_level in case). > Code calculates the difference between now() and status_update_time. > If the record is null this difference is far more than the > configurable interval, tipical 180 sec. Which code and which configuration? The only thing I can see here is tstamp.tv_sec which is a converted timestamp got from eventbroker module. This is kind of now() but recently a now() from Nagios itsself. You may check ndo2db.c;ndo2db_convert_standard_data_elements The other compared value is dbinfo.latest_realtime_data_time which is initialized in db.c:ndo2db_db_init and then updated if dbinfo.latest_program_status_time newer (db.c:374; directly to that value). There are several other realtime datavalues which may update this value. So the clue of this data is - if actual Nagios NDO_DATA_TIMESTAMP is newer than the latest realtime data gotten some time before, it is time for a cleanup at the very beginning of ndo2db (check the sequence in ido2db.c:main). > > The problem is that this difference suddenly vary from near 0 to > infinity. The conditional statement does not only insist on the difference 0 or more but also if it is a process pre launch (see above). But besides a question - how did you get to this values? Current NDOUtils code doesn't give and debug information at this stage. > > Perhaps this is a problem in my ndo setup, and those deletes normally > occurs rarely. But I saw them every 60 seconds. > Here ndo2db log: > > As you can see the ndo2db pid varies, I think that when it has no more > data the child exits, an a new one is forked. The new child then > deletes records in db. Seeing your ndo2db die and refork explains why the pre_launch_state and timestamp condition is matching and so within each period of time, database cleanup is performed. It would be interesting why ndo2db is dying. Depending on your configuration this may vary - tcp or unix socket e.g.? What about more detailed debuglogs or are there messages like "error writing to datasink" in the logs? Kind regards, Michael -- DI (FH) Michael Friedrich mic...@un... Tel: +43 1 4277 14359 Vienna University Computer Center Universitaetsstrasse 7 A-1010 Vienna, Austria |