From: Sand P. <Phi...@sy...> - 2005-06-28 07:07:47
|
I'm not using the perfparse pipe method, but I think this may bet he = reason for your latency. As far as I understood, the pipe method is not = suiteable when you have a lot of performance data, because for every = data it opens a new process, when you haven't configured Nagios with = embedded perl and perlcache. Maybe you should try out another method. I'm running perfparse with the = "Periodic Nagios Log Parse" with about 2000 Service checks... _____________________________ =20 Philipp Sand OC-CC-TEC-SYS =20 SYCOR GmbH Heinrich-von-Stephan-Stra=DFe 1-5 D - 37073 G=F6ttingen =20 Telefon +49 (0) 551 - 490 - 0 Telefax +49 (0) 551 - 490 - 232468 =20 phi...@sy... www.sycor.de ------------------------------------------------ =20 > -----Urspr=FCngliche Nachricht----- > Von: nag...@li... [mailto:nagios-users- > ad...@li...] Im Auftrag von Mieden, Rick van der > Gesendet: Dienstag, 28. Juni 2005 08:50 > An: nag...@li...; perfparse- > us...@li... > Betreff: RE: [Nagios-users] huge performance problems, nagios perparse >=20 > All, >=20 > I've solved my performance problems. It was caused by the performance > data in combination with perfparse. When I stopped the performance = data > I reached a latency of 0,4 sec. > Of course the question is why...... >=20 > Below my configuration related to perfparse and performance data. Any > remarks on anything what I did wrong and could cause the heavy > performance load would be nice. I also add-ed the perparse-users list, > perhaps somebody from that list can have a look at it? >=20 > Regards >=20 > Rick >=20 > cfg_file=3D/usr/local/nagios/etc/nagios_perfparse.cfg > perfdata_timeout=3D5 > process_performance_data=3D1 > host_perfdata_command=3Dprocess-host-perfdata > service_perfdata_command=3Dprocess-service-perfdata > host_perfdata_file=3D/usr/local/nagios/var/hostperf.log > service_perfdata_file=3D/usr/local/nagios/var/serviceperf.log > host_perfdata_file_mode=3Dw > service_perfdata_file_mode=3Dw >=20 > my /usr/local/nagios/etc/nagios_perfparse.cfg looks like: >=20 > define command { > command_name process-service-perfdata > command_line > /usr/local/nagios/bin/perfparse_nagios_pipe_command.pl > /usr/local/nagios/var/perfdata-service.log "$TIMET$" "$HOSTNAME$" > "$SERVICEDESC$" "$SERVICEOUTPUT$" "$SERVICESTATE$" "$SERVICEPERFDATA$" > } >=20 > define command { > command_name process-host-perfdata > command_line > /usr/local/nagios/bin/perfparse_nagios_pipe_command.pl > /usr/local/nagios/var/perfdata-host.log > "$TIMET$" "$HOSTNAME$" "$HOSTOUTPUT$" "$HOSTPERFDATA$" > } >=20 >=20 >=20 > -----Original Message----- > From: Hendrik Baecker [mailto:b0...@gm...] > Sent: Monday, June 27, 2005 15:31 > To: Mieden, Rick van der > Cc: nag...@li...; mar...@sa... > Subject: Re: [Nagios-users] huge performance problems >=20 > Mieden, Rick van der schrieb: >=20 > > Thanks for the responses, I tweaked it a bit, but still have a bad > > latency with 174 hosts and 2360 services. )I tuned it down from 540 > > sec to 224 seconds. My plugins are fine, they are really fast on > > commandline. I also have noticed that the latency drops to 4 secs if = I > > have around 1700 services running. So it looks like Nagios has some > > problems when the amount of services go over 2000 over something = like > > that. > > > > I'v read something with the USE_MEMORY_PERFORMANCE_TWEAKS. But even > > that option does not do anything better with the latency. I also = have > > read that there are many people who has far more hosts and services > > checks than I have without any performance problems. So I'd love to > > see their nagios.cfg, or would like to know what the trick is. > > > > Regards, > > > > Rick > > > Hi, >=20 > nearly the same on our side. Nagios with 1900 Services runs with max. > 2-4 seconds Latency. But beware if you want more... >=20 > I heard from this people too which have more than 2000 Services but = most > of them are doing a kind of distributed monitoring I think. >=20 > Regards, > Hendrik >=20 > > -----Original Message----- > > *From:* Hendrik Baecker [mailto:b0...@gm...] > > *Sent:* Thursday, June 23, 2005 15:50 > > *To:* Mieden, Rick van der > > *Cc:* nag...@li... > > *Subject:* Re: [Nagios-users] huge performance problems > > > > Hi, > > > > one year ago we have had nearly the same performance Problems too. > > > > It seems that the scheduler of nagios roles over itself if the count > > of services is to big. Therefore we decided to install another = nagios > > process with different configs in a different directory. So we > > splitted our nagios like our networks. One Nagios (nagios-1) for > > Network A and another one (nagios-2) for Network B. > > > > So our count of services per nagios instance was decreased and it = runs > > so far so good. > > > > All this was under version 1.2. > > > > In the past I posted some questions about our problem but there were > > no good answer on it, so today I just only know that it works for = us. > > > > So far for this. > > I hope nobody will geek me when I take your post to describe some > > problems we now have on testing above doing with different instances > > on the same host with nagios 2.02b. > > > > When I fire up my instance "nagios-1" with around 1600 Service = Checks > > it runs very fine with nearly no latency. > > But when I fire up the "nagios-2" with around 1850 services this > > instance runs very fast to latencies around 100 seconds. > > When I now stop the first instance the latencies on the second one > > decrease down to < 5 seconds. > > > > Perhaps some of the developer can tell me if I am right in theory = that > > (one of) the working thread(s) with the scheduling queue can see the > > other scheduling queue? Are the possibly the same? > > > > I am not a programmer but I can think about following: Starting > > nagios-1 will create the scheduling queue and gives it to RAM. So = far > > so good. There it is and the worker runs through it and executes the > > checks. > > I am now afraid that when I start my second nagios process this will > > also create the scheduling queue into the system RAM but that the = two > > proceses don't have their own queues... Hope that anybody understand > > what I mean. > > > > Best regards > > Hendrik > > > > Mieden, Rick van der schrieb: > > > > We have heavy performance problems with Nagios. We monitor 174 = hosts, > > with 2255 services and an average latency off 400 seconds!!!! Off > > course that's not exceptable. > > > > I use perl plugins with ssh and snmp plugins. I'v compiled nagios = with > > perlcache and embedded-perl enabled. The server is a sparc server = with > > 2 x 1.1 Ghz CPU and 1024 RAM. (Solaris 8, latest patch-level) > > > > I played around with all kind of parameters and read the tuning docs > > for nagios. > > > > Below the output of "nagios -s nagios.cfg": > > > > Nagios 2.0b3 > > > > Copyright (c) 1999-2005 Ethan Galstad (www.nagios.org > > <http://www.nagios.org>) > > > > Last Modified: 04-03-2005 > > > > License: GPL > > > > Projected scheduling information for host and service > > > > checks is listed below. This information assumes that > > > > you are going to start running Nagios with your current > > > > config files. > > > > HOST SCHEDULING INFORMATION > > > > --------------------------- > > > > Total hosts: 174 > > > > Total scheduled hosts: 0 > > > > Host inter-check delay method: SMART > > > > Average host check interval: 0.00 sec > > > > Host inter-check delay: 0.00 sec > > > > Max host check spread: 30 min > > > > First scheduled check: N/A > > > > Last scheduled check: N/A > > > > SERVICE SCHEDULING INFORMATION > > > > ------------------------------- > > > > Total services: 2255 > > > > Total scheduled services: 2255 > > > > Service inter-check delay method: SMART > > > > Average service check interval: 222.47 sec > > > > Inter-check delay: 0.10 sec > > > > Interleave factor method: SMART > > > > Average services per host: 12.96 > > > > Service interleave factor: 13 > > > > Max service check spread: 30 min > > > > First scheduled check: Wed Jun 22 15:05:08 2005 > > > > Last scheduled check: Wed Jun 22 15:08:50 2005 > > > > CHECK PROCESSING INFORMATION > > > > ---------------------------- > > > > Service check reaper interval: 5 sec > > > > Max concurrent service checks: 200 > > > > PERFORMANCE SUGGESTIONS > > > > ----------------------- > > > > I have no suggestions - things look okay. > > > > And a nagiostat output: > > > > CURRENT STATUS DATA > > > > ---------------------------------------------------- > > > > Status File: /usr/local/nagios/var/status.dat > > > > Status File Age: 0d 0h 0m 13s > > > > Status File Version: 2.0b3 > > > > Program Running Time: 0d 32h 0m 13s > > > > Total Services: 2255 > > > > Services Checked: 2255 > > > > Services Scheduled: 2255 > > > > Active Service Checks: 2255 > > > > Passive Service Checks: 0 > > > > Total Service State Change: 0.000 / 5.860 / 0.003 % > > > > *Active Service Latency: 386.526 / 414.446 / 394.100 %* > > > > Active Service Execution Time: 0.062 / 60.349 / 1.428 sec > > > > Active Service State Change: 0.000 / 5.860 / 0.003 % > > > > *Active Services Last 1/5/15/60 min: 155 / 1044 / 2255 / 2255* > > > > Passive Service State Change: 0.000 / 0.000 / 0.000 % > > > > Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0 > > > > Services Ok/Warn/Unk/Crit: 2242 / 0 / 0 / 13 > > > > Services Flapping: 0 > > > > Services In Downtime: 0 > > > > Total Hosts: 174 > > > > Hosts Checked: 174 > > > > Hosts Scheduled: 0 > > > > Active Host Checks: 174 > > > > Passive Host Checks: 0 > > > > Total Host State Change: 0.000 / 0.000 / 0.000 % > > > > Active Host Latency: 0.000 / 0.000 / 0.000 % > > > > Active Host Execution Time: 0.137 / 1.109 / 0.582 sec > > > > Active Host State Change: 0.000 / 0.000 / 0.000 % > > > > Active Hosts Last 1/5/15/60 min: 1 / 2 / 2 / 9 > > > > Passive Host State Change: 0.000 / 0.000 / 0.000 % > > > > Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0 > > > > Hosts Up/Down/Unreach: 174 / 0 / 0 > > > > Hosts Flapping: 0 > > > > Hosts In Downtime: 0 > > > > Anybody an idea what went wrong here? There must be something...... > > > > Regards, > > > > Rick > > > > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is > > alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht > > ontvangt, wordt u verzocht de inhoud niet te gebruiken en de = afzender > > direct te informeren door het bericht te retourneren. Hoewel Orange > > maatregelen heeft genomen om virussen in deze email of attachments = te > > voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn > > aangezien Orange niet aansprakelijk is voor computervirussen die > > veroorzaakt zijn door deze email. > > > > The information contained in this message may be confidential and is > > intended to be only for the addressee. Should you receive this = message > > unintentionally, please do not use the contents herein and notify = the > > sender immediately by return e-mail. Although Orange has taken steps > > to ensure that this email and attachments are free from any virus, = you > > do need to verify the possibility of their existence as Orange can > > take no responsibility for any computer virus which might be > > transferred by way of this email. > > > > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is > > alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht > > ontvangt, wordt u verzocht de inhoud niet te gebruiken en de = afzender > > direct te informeren door het bericht te retourneren. Hoewel Orange > > maatregelen heeft genomen om virussen in deze email of attachments = te > > voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn > > aangezien Orange niet aansprakelijk is voor computervirussen die > > veroorzaakt zijn door deze email. > > > > The information contained in this message may be confidential and is > > intended to be only for the addressee. Should you receive this = message > > unintentionally, please do not use the contents herein and notify = the > > sender immediately by return e-mail. Although Orange has taken steps > > to ensure that this email and attachments are free from any virus, = you > > do need to verify the possibility of their existence as Orange can > > take no responsibility for any computer virus which might be > > transferred by way of this email. > > > > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > >=20 >=20 >=20 > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is = alleen > bestemd voor de geadresseerde. Indien u dit bericht onterecht = ontvangt, > wordt u verzocht de inhoud niet te gebruiken en de afzender direct te > informeren door het bericht te retourneren. Hoewel Orange maatregelen > heeft genomen om virussen in deze email of attachments te voorkomen, = dient > u ook zelf na te gaan of virussen aanwezig zijn aangezien Orange niet > aansprakelijk is voor computervirussen die veroorzaakt zijn door deze > email. >=20 > The information contained in this message may be confidential and is > intended to be only for the addressee. Should you receive this message > unintentionally, please do not use the contents herein and notify the > sender immediately by return e-mail. Although Orange has taken steps = to > ensure that this email and attachments are free from any virus, you do > need to verify the possibility of their existence as Orange can take = no > responsibility for any computer virus which might be transferred by = way of > this email. >=20 >=20 > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 >=20 > ------------------------------------------------------- > SF.Net email is sponsored by: Discover Easy Linux Migration Strategies > from IBM. Find simple to follow Roadmaps, straightforward articles, > informative Webcasts and more! Get everything you need to get up to > speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id=16492&op=3Dick > _______________________________________________ > Nagios-users mailing list > Nag...@li... > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null |