From: Mieden, R. v. d. <ric...@or...> - 2005-06-28 06:50:32
|
All, I've solved my performance problems. It was caused by the performance data in combination with perfparse. When I stopped the performance data I reached a latency of 0,4 sec.=20 Of course the question is why...... Below my configuration related to perfparse and performance data. Any remarks on anything what I did wrong and could cause the heavy performance load would be nice. I also add-ed the perparse-users list, perhaps somebody from that list can have a look at it? Regards Rick cfg_file=3D/usr/local/nagios/etc/nagios_perfparse.cfg perfdata_timeout=3D5 process_performance_data=3D1 host_perfdata_command=3Dprocess-host-perfdata service_perfdata_command=3Dprocess-service-perfdata=20 host_perfdata_file=3D/usr/local/nagios/var/hostperf.log service_perfdata_file=3D/usr/local/nagios/var/serviceperf.log host_perfdata_file_mode=3Dw service_perfdata_file_mode=3Dw my /usr/local/nagios/etc/nagios_perfparse.cfg looks like: define command { command_name process-service-perfdata command_line /usr/local/nagios/bin/perfparse_nagios_pipe_command.pl /usr/local/nagios/var/perfdata-service.log "$TIMET$" "$HOSTNAME$" "$SERVICEDESC$" "$SERVICEOUTPUT$" "$SERVICESTATE$" "$SERVICEPERFDATA$" } define command { command_name process-host-perfdata command_line /usr/local/nagios/bin/perfparse_nagios_pipe_command.pl /usr/local/nagios/var/perfdata-host.log "$TIMET$" "$HOSTNAME$" "$HOSTOUTPUT$" "$HOSTPERFDATA$" } -----Original Message----- From: Hendrik Baecker [mailto:b0...@gm...]=20 Sent: Monday, June 27, 2005 15:31 To: Mieden, Rick van der Cc: nag...@li...; mar...@sa... Subject: Re: [Nagios-users] huge performance problems Mieden, Rick van der schrieb: > Thanks for the responses, I tweaked it a bit, but still have a bad > latency with 174 hosts and 2360 services. )I tuned it down from 540 > sec to 224 seconds. My plugins are fine, they are really fast on > commandline. I also have noticed that the latency drops to 4 secs if I > have around 1700 services running. So it looks like Nagios has some > problems when the amount of services go over 2000 over something like > that. > > I'v read something with the USE_MEMORY_PERFORMANCE_TWEAKS. But even > that option does not do anything better with the latency. I also have > read that there are many people who has far more hosts and services > checks than I have without any performance problems. So I'd love to > see their nagios.cfg, or would like to know what the trick is. > > Regards, > > Rick > Hi, nearly the same on our side. Nagios with 1900 Services runs with max. 2-4 seconds Latency. But beware if you want more... I heard from this people too which have more than 2000 Services but most of them are doing a kind of distributed monitoring I think. Regards, Hendrik > -----Original Message----- > *From:* Hendrik Baecker [mailto:b0...@gm...] > *Sent:* Thursday, June 23, 2005 15:50 > *To:* Mieden, Rick van der > *Cc:* nag...@li... > *Subject:* Re: [Nagios-users] huge performance problems > > Hi, > > one year ago we have had nearly the same performance Problems too. > > It seems that the scheduler of nagios roles over itself if the count > of services is to big. Therefore we decided to install another nagios > process with different configs in a different directory. So we > splitted our nagios like our networks. One Nagios (nagios-1) for > Network A and another one (nagios-2) for Network B. > > So our count of services per nagios instance was decreased and it runs > so far so good. > > All this was under version 1.2. > > In the past I posted some questions about our problem but there were > no good answer on it, so today I just only know that it works for us. > > So far for this. > I hope nobody will geek me when I take your post to describe some > problems we now have on testing above doing with different instances > on the same host with nagios 2.02b. > > When I fire up my instance "nagios-1" with around 1600 Service Checks > it runs very fine with nearly no latency. > But when I fire up the "nagios-2" with around 1850 services this > instance runs very fast to latencies around 100 seconds. > When I now stop the first instance the latencies on the second one > decrease down to < 5 seconds. > > Perhaps some of the developer can tell me if I am right in theory that > (one of) the working thread(s) with the scheduling queue can see the > other scheduling queue? Are the possibly the same? > > I am not a programmer but I can think about following: Starting > nagios-1 will create the scheduling queue and gives it to RAM. So far > so good. There it is and the worker runs through it and executes the > checks. > I am now afraid that when I start my second nagios process this will > also create the scheduling queue into the system RAM but that the two > proceses don't have their own queues... Hope that anybody understand > what I mean. > > Best regards > Hendrik > > Mieden, Rick van der schrieb: > > We have heavy performance problems with Nagios. We monitor 174 hosts, > with 2255 services and an average latency off 400 seconds!!!! Off > course that's not exceptable. > > I use perl plugins with ssh and snmp plugins. I'v compiled nagios with > perlcache and embedded-perl enabled. The server is a sparc server with > 2 x 1.1 Ghz CPU and 1024 RAM. (Solaris 8, latest patch-level) > > I played around with all kind of parameters and read the tuning docs > for nagios. > > Below the output of "nagios -s nagios.cfg": > > Nagios 2.0b3 > > Copyright (c) 1999-2005 Ethan Galstad (www.nagios.org > <http://www.nagios.org>) > > Last Modified: 04-03-2005 > > License: GPL > > Projected scheduling information for host and service > > checks is listed below. This information assumes that > > you are going to start running Nagios with your current > > config files. > > HOST SCHEDULING INFORMATION > > --------------------------- > > Total hosts: 174 > > Total scheduled hosts: 0 > > Host inter-check delay method: SMART > > Average host check interval: 0.00 sec > > Host inter-check delay: 0.00 sec > > Max host check spread: 30 min > > First scheduled check: N/A > > Last scheduled check: N/A > > SERVICE SCHEDULING INFORMATION > > ------------------------------- > > Total services: 2255 > > Total scheduled services: 2255 > > Service inter-check delay method: SMART > > Average service check interval: 222.47 sec > > Inter-check delay: 0.10 sec > > Interleave factor method: SMART > > Average services per host: 12.96 > > Service interleave factor: 13 > > Max service check spread: 30 min > > First scheduled check: Wed Jun 22 15:05:08 2005 > > Last scheduled check: Wed Jun 22 15:08:50 2005 > > CHECK PROCESSING INFORMATION > > ---------------------------- > > Service check reaper interval: 5 sec > > Max concurrent service checks: 200 > > PERFORMANCE SUGGESTIONS > > ----------------------- > > I have no suggestions - things look okay. > > And a nagiostat output: > > CURRENT STATUS DATA > > ---------------------------------------------------- > > Status File: /usr/local/nagios/var/status.dat > > Status File Age: 0d 0h 0m 13s > > Status File Version: 2.0b3 > > Program Running Time: 0d 32h 0m 13s > > Total Services: 2255 > > Services Checked: 2255 > > Services Scheduled: 2255 > > Active Service Checks: 2255 > > Passive Service Checks: 0 > > Total Service State Change: 0.000 / 5.860 / 0.003 % > > *Active Service Latency: 386.526 / 414.446 / 394.100 %* > > Active Service Execution Time: 0.062 / 60.349 / 1.428 sec > > Active Service State Change: 0.000 / 5.860 / 0.003 % > > *Active Services Last 1/5/15/60 min: 155 / 1044 / 2255 / 2255* > > Passive Service State Change: 0.000 / 0.000 / 0.000 % > > Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0 > > Services Ok/Warn/Unk/Crit: 2242 / 0 / 0 / 13 > > Services Flapping: 0 > > Services In Downtime: 0 > > Total Hosts: 174 > > Hosts Checked: 174 > > Hosts Scheduled: 0 > > Active Host Checks: 174 > > Passive Host Checks: 0 > > Total Host State Change: 0.000 / 0.000 / 0.000 % > > Active Host Latency: 0.000 / 0.000 / 0.000 % > > Active Host Execution Time: 0.137 / 1.109 / 0.582 sec > > Active Host State Change: 0.000 / 0.000 / 0.000 % > > Active Hosts Last 1/5/15/60 min: 1 / 2 / 2 / 9 > > Passive Host State Change: 0.000 / 0.000 / 0.000 % > > Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0 > > Hosts Up/Down/Unreach: 174 / 0 / 0 > > Hosts Flapping: 0 > > Hosts In Downtime: 0 > > Anybody an idea what went wrong here? There must be something...... > > Regards, > > Rick > > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is > alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht > ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender > direct te informeren door het bericht te retourneren. Hoewel Orange > maatregelen heeft genomen om virussen in deze email of attachments te > voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn > aangezien Orange niet aansprakelijk is voor computervirussen die > veroorzaakt zijn door deze email. > > The information contained in this message may be confidential and is > intended to be only for the addressee. Should you receive this message > unintentionally, please do not use the contents herein and notify the > sender immediately by return e-mail. Although Orange has taken steps > to ensure that this email and attachments are free from any virus, you > do need to verify the possibility of their existence as Orange can > take no responsibility for any computer virus which might be > transferred by way of this email. > > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is > alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht > ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender > direct te informeren door het bericht te retourneren. Hoewel Orange > maatregelen heeft genomen om virussen in deze email of attachments te > voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn > aangezien Orange niet aansprakelijk is voor computervirussen die > veroorzaakt zijn door deze email. > > The information contained in this message may be confidential and is > intended to be only for the addressee. Should you receive this message > unintentionally, please do not use the contents herein and notify the > sender immediately by return e-mail. Although Orange has taken steps > to ensure that this email and attachments are free from any virus, you > do need to verify the possibility of their existence as Orange can > take no responsibility for any computer virus which might be > transferred by way of this email. > > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is = alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht = ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender = direct te informeren door het bericht te retourneren. Hoewel Orange = maatregelen heeft genomen om virussen in deze email of attachments te = voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn = aangezien Orange niet aansprakelijk is voor computervirussen die = veroorzaakt zijn door deze email. The information contained in this message may be confidential and is = intended to be only for the addressee. Should you receive this message = unintentionally, please do not use the contents herein and notify the = sender immediately by return e-mail. Although Orange has taken steps to = ensure that this email and attachments are free from any virus, you do = need to verify the possibility of their existence as Orange can take no = responsibility for any computer virus which might be transferred by way = of this email. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D |