From: Antoine R. <ar...@lo...> - 2004-05-28 19:25:25
|
--On Friday, May 28, 2004 9:46 AM +0200 jan gregor=20 <pa...@ra...> wrote: >> For what it's worth, I'm having similar issues myself too. My setup is a >> bit different so I'll post it below. What happens here is that I have >> two Nagios processes running on two different hosts, in different >> subnets. The one >> doing the actual checks is obsessing over services and sends the results >> through nsca to the main nagios host. The main host seems to decide my >> services results aren't fresh enough, then runs the check_command, which >> is a dummy script returning WARNING (originally CRITICAL but it >> generated too many notifications..), then, a couple seconds or minutes >> later, a new passive >> check comes in, which brings the service(s) back to OK, then a couple >> minutes >> later, it switches back to WARNING and so on.. > > Why are you doing freshness checking on master host? Is that of any use? > Please, correct me, if i'm wrong, but freshness checking is mainly for > active checking. Only idea when this is usable with passive is in > passive+active checks, when one services are configured to accept > passive check and doing active checks over some time (to check if we > have not missed somthing). Again, maybe I overlooked something important, > please correct me, if I'm terribly wrong. Actually, the idea is that when active_checks are disabled, the=20 check_command is never run as long as the passive checks come in frequently = enough. According to the docs (the part about distributed monitoring=20 and/or freshness checking), IF the results are not fresh enough, then the=20 check_command will be executed. In a failover/redundancy situation, that=20 would be ideal as you main machine does not usually perform the tests but=20 will if the results are getting stale. In my situation though, the main machine *cannot* access the services that=20 the second host is monitoring. What is configured instead, is a=20 check_command that will always return an error (right now, I return WARNING = but I would like it to be "CRITICAL") stating that the results are stale.=20 This would indicate that the nagios process on the 2nd machine is no longer = sending passive checks OR that the checks somehow don't make it through to=20 the main machine. In any case, I would get a notification and would start=20 investigating. This is exactly what I am trying to achieve. Now, my problem is the=20 following: the second nagios process is doing active checks, the=20 service(s) checked never or rarely go down (eg: fping on an otherwise=20 working machine). I can see on the MAIN host that the passive checks are=20 being received AND processed by nagios yet it decides for some reason that=20 the results are not fresh and run the check_command defined (which returns=20 WARNING). Net result is, according to the second machine, my services are up 100% of=20 the time. According to the MAIN machine, those services go OK - WARNING -=20 OK - WARNING - OK - WARNING every couple of minutes.. Would anyone know which timeout or setting to tweak so that it HAS to wait=20 for much much longer without having received the passive checks before it=20 actually decides to take matter in its own hands and run the check_command=20 defined? (Please see my previous post to see my configuration details,=20 services definitions, etc). > Best regards > > Jan Gregor thank you! Antoine -- Antoine Reid Administrateur Syst=E8me - System Administrator __________________________________________________ Logient Inc. Solutions de logiciels Internet - Internet Software Solutions 417 St-Pierre, Suite #700 Montr=E9al (Qc) Canada H2Y 2M4 T. 514-282-4118 ext.32 F. 514-288-0033 www.logient.com *AVIS DE CONFIDENTIALIT=C9* L'information apparaissant dans ce message est l=E9galement = privil=E9gi=E9e et confidentielle. Elle est destin=E9e =E0 l'usage exclusif de son = destinataire tel qu'identifi=E9 ci-dessus. Si ce document vous est parvenu par erreur, soyez par la pr=E9sente avis=E9 que sa lecture, sa reproduction ou sa distribution sont strictement interdites. Vous =EAtes en cons=E9quence = pri=E9 de nous aviser imm=E9diatement par t=E9l=E9phone au (514) 282-4118 ou par = courriel. Veuillez de plus d=E9truire le message. Merci. *CONFIDENTIALITY NOTE* This message along with any enclosed documents are confidential and are legally privileged. They are intended only for the person(s) or organization(s) named above and any other use or disclosure is strictly forbidden. If this message is received by anyone else, please notify us at once by telephone (514) 282-4118 or e-mail and destroy this message. Thank you. |