|
From: Andreas E. <ae...@op...> - 2009-06-17 09:00:58
|
Mark Eisenblaetter wrote: > Hi Toon, > > i have not tested it but i think it could have an konzeptionel problem. > > If the intervall ist smaller then the window it's possible to some checks > will never be checked. > > In the default values it is possible that nagios will chose evey 30 secs the > same check to move it 180secs away. > > So i haven't test it till now. > The randomization thing will almost certainly make sure that doesn't happen. To be 100% certain it doesn't happen, one has to inspect the elements queued for running and make a weighting of them to see how long ago something was actually checked before making the decision on what to check. Or one could simply iterate linearly over the event-list, which would cause latency to increase slowly if max_concurrent_checks is set too low, but would ensure that checks are run in the order they are scheduled. In order to make sure checks are run in the approximate time they're scheduled to run, this is probably the best bet. I'll need to investigate that though and I'm sorely short on time now both due to Merlin/Ninja and due to recreational activities taking place during the summer. Any other takers on this? The questions that need answering are: 1. How does this affect the high/low prio list? 2. How much does latency increase (latency has to be re-calculated for each re-scheduled check as well as for all checks that happen off-time). 3. How does it affect the load on the system? Are we currently spending too much time re-scheduling checks so we'd actually gain "checks-per-second" performance by just iterating linearly over the list? Once we know the above, writing the code will probably be quite trivial. -- Andreas Ericsson and...@op... OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. |