|
From: Mieden, R. v. d. <ric...@or...> - 2005-06-28 06:50:32
|
All,
I've solved my performance problems. It was caused by the performance
data in combination with perfparse. When I stopped the performance data
I reached a latency of 0,4 sec.=20
Of course the question is why......
Below my configuration related to perfparse and performance data. Any
remarks on anything what I did wrong and could cause the heavy
performance load would be nice. I also add-ed the perparse-users list,
perhaps somebody from that list can have a look at it?
Regards
Rick
cfg_file=3D/usr/local/nagios/etc/nagios_perfparse.cfg
perfdata_timeout=3D5
process_performance_data=3D1
host_perfdata_command=3Dprocess-host-perfdata
service_perfdata_command=3Dprocess-service-perfdata=20
host_perfdata_file=3D/usr/local/nagios/var/hostperf.log
service_perfdata_file=3D/usr/local/nagios/var/serviceperf.log
host_perfdata_file_mode=3Dw
service_perfdata_file_mode=3Dw
my /usr/local/nagios/etc/nagios_perfparse.cfg looks like:
define command {
command_name process-service-perfdata
command_line
/usr/local/nagios/bin/perfparse_nagios_pipe_command.pl
/usr/local/nagios/var/perfdata-service.log "$TIMET$" "$HOSTNAME$"
"$SERVICEDESC$" "$SERVICEOUTPUT$" "$SERVICESTATE$" "$SERVICEPERFDATA$"
}
define command {
command_name process-host-perfdata
command_line
/usr/local/nagios/bin/perfparse_nagios_pipe_command.pl
/usr/local/nagios/var/perfdata-host.log
"$TIMET$" "$HOSTNAME$" "$HOSTOUTPUT$" "$HOSTPERFDATA$"
}
-----Original Message-----
From: Hendrik Baecker [mailto:b0...@gm...]=20
Sent: Monday, June 27, 2005 15:31
To: Mieden, Rick van der
Cc: nag...@li...; mar...@sa...
Subject: Re: [Nagios-users] huge performance problems
Mieden, Rick van der schrieb:
> Thanks for the responses, I tweaked it a bit, but still have a bad
> latency with 174 hosts and 2360 services. )I tuned it down from 540
> sec to 224 seconds. My plugins are fine, they are really fast on
> commandline. I also have noticed that the latency drops to 4 secs if I
> have around 1700 services running. So it looks like Nagios has some
> problems when the amount of services go over 2000 over something like
> that.
>
> I'v read something with the USE_MEMORY_PERFORMANCE_TWEAKS. But even
> that option does not do anything better with the latency. I also have
> read that there are many people who has far more hosts and services
> checks than I have without any performance problems. So I'd love to
> see their nagios.cfg, or would like to know what the trick is.
>
> Regards,
>
> Rick
>
Hi,
nearly the same on our side. Nagios with 1900 Services runs with max.
2-4 seconds Latency. But beware if you want more...
I heard from this people too which have more than 2000 Services but most
of them are doing a kind of distributed monitoring I think.
Regards,
Hendrik
> -----Original Message-----
> *From:* Hendrik Baecker [mailto:b0...@gm...]
> *Sent:* Thursday, June 23, 2005 15:50
> *To:* Mieden, Rick van der
> *Cc:* nag...@li...
> *Subject:* Re: [Nagios-users] huge performance problems
>
> Hi,
>
> one year ago we have had nearly the same performance Problems too.
>
> It seems that the scheduler of nagios roles over itself if the count
> of services is to big. Therefore we decided to install another nagios
> process with different configs in a different directory. So we
> splitted our nagios like our networks. One Nagios (nagios-1) for
> Network A and another one (nagios-2) for Network B.
>
> So our count of services per nagios instance was decreased and it runs
> so far so good.
>
> All this was under version 1.2.
>
> In the past I posted some questions about our problem but there were
> no good answer on it, so today I just only know that it works for us.
>
> So far for this.
> I hope nobody will geek me when I take your post to describe some
> problems we now have on testing above doing with different instances
> on the same host with nagios 2.02b.
>
> When I fire up my instance "nagios-1" with around 1600 Service Checks
> it runs very fine with nearly no latency.
> But when I fire up the "nagios-2" with around 1850 services this
> instance runs very fast to latencies around 100 seconds.
> When I now stop the first instance the latencies on the second one
> decrease down to < 5 seconds.
>
> Perhaps some of the developer can tell me if I am right in theory that
> (one of) the working thread(s) with the scheduling queue can see the
> other scheduling queue? Are the possibly the same?
>
> I am not a programmer but I can think about following: Starting
> nagios-1 will create the scheduling queue and gives it to RAM. So far
> so good. There it is and the worker runs through it and executes the
> checks.
> I am now afraid that when I start my second nagios process this will
> also create the scheduling queue into the system RAM but that the two
> proceses don't have their own queues... Hope that anybody understand
> what I mean.
>
> Best regards
> Hendrik
>
> Mieden, Rick van der schrieb:
>
> We have heavy performance problems with Nagios. We monitor 174 hosts,
> with 2255 services and an average latency off 400 seconds!!!! Off
> course that's not exceptable.
>
> I use perl plugins with ssh and snmp plugins. I'v compiled nagios with
> perlcache and embedded-perl enabled. The server is a sparc server with
> 2 x 1.1 Ghz CPU and 1024 RAM. (Solaris 8, latest patch-level)
>
> I played around with all kind of parameters and read the tuning docs
> for nagios.
>
> Below the output of "nagios -s nagios.cfg":
>
> Nagios 2.0b3
>
> Copyright (c) 1999-2005 Ethan Galstad (www.nagios.org
> <http://www.nagios.org>)
>
> Last Modified: 04-03-2005
>
> License: GPL
>
> Projected scheduling information for host and service
>
> checks is listed below. This information assumes that
>
> you are going to start running Nagios with your current
>
> config files.
>
> HOST SCHEDULING INFORMATION
>
> ---------------------------
>
> Total hosts: 174
>
> Total scheduled hosts: 0
>
> Host inter-check delay method: SMART
>
> Average host check interval: 0.00 sec
>
> Host inter-check delay: 0.00 sec
>
> Max host check spread: 30 min
>
> First scheduled check: N/A
>
> Last scheduled check: N/A
>
> SERVICE SCHEDULING INFORMATION
>
> -------------------------------
>
> Total services: 2255
>
> Total scheduled services: 2255
>
> Service inter-check delay method: SMART
>
> Average service check interval: 222.47 sec
>
> Inter-check delay: 0.10 sec
>
> Interleave factor method: SMART
>
> Average services per host: 12.96
>
> Service interleave factor: 13
>
> Max service check spread: 30 min
>
> First scheduled check: Wed Jun 22 15:05:08 2005
>
> Last scheduled check: Wed Jun 22 15:08:50 2005
>
> CHECK PROCESSING INFORMATION
>
> ----------------------------
>
> Service check reaper interval: 5 sec
>
> Max concurrent service checks: 200
>
> PERFORMANCE SUGGESTIONS
>
> -----------------------
>
> I have no suggestions - things look okay.
>
> And a nagiostat output:
>
> CURRENT STATUS DATA
>
> ----------------------------------------------------
>
> Status File: /usr/local/nagios/var/status.dat
>
> Status File Age: 0d 0h 0m 13s
>
> Status File Version: 2.0b3
>
> Program Running Time: 0d 32h 0m 13s
>
> Total Services: 2255
>
> Services Checked: 2255
>
> Services Scheduled: 2255
>
> Active Service Checks: 2255
>
> Passive Service Checks: 0
>
> Total Service State Change: 0.000 / 5.860 / 0.003 %
>
> *Active Service Latency: 386.526 / 414.446 / 394.100 %*
>
> Active Service Execution Time: 0.062 / 60.349 / 1.428 sec
>
> Active Service State Change: 0.000 / 5.860 / 0.003 %
>
> *Active Services Last 1/5/15/60 min: 155 / 1044 / 2255 / 2255*
>
> Passive Service State Change: 0.000 / 0.000 / 0.000 %
>
> Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
>
> Services Ok/Warn/Unk/Crit: 2242 / 0 / 0 / 13
>
> Services Flapping: 0
>
> Services In Downtime: 0
>
> Total Hosts: 174
>
> Hosts Checked: 174
>
> Hosts Scheduled: 0
>
> Active Host Checks: 174
>
> Passive Host Checks: 0
>
> Total Host State Change: 0.000 / 0.000 / 0.000 %
>
> Active Host Latency: 0.000 / 0.000 / 0.000 %
>
> Active Host Execution Time: 0.137 / 1.109 / 0.582 sec
>
> Active Host State Change: 0.000 / 0.000 / 0.000 %
>
> Active Hosts Last 1/5/15/60 min: 1 / 2 / 2 / 9
>
> Passive Host State Change: 0.000 / 0.000 / 0.000 %
>
> Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
>
> Hosts Up/Down/Unreach: 174 / 0 / 0
>
> Hosts Flapping: 0
>
> Hosts In Downtime: 0
>
> Anybody an idea what went wrong here? There must be something......
>
> Regards,
>
> Rick
>
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
> De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is
> alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht
> ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender
> direct te informeren door het bericht te retourneren. Hoewel Orange
> maatregelen heeft genomen om virussen in deze email of attachments te
> voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn
> aangezien Orange niet aansprakelijk is voor computervirussen die
> veroorzaakt zijn door deze email.
>
> The information contained in this message may be confidential and is
> intended to be only for the addressee. Should you receive this message
> unintentionally, please do not use the contents herein and notify the
> sender immediately by return e-mail. Although Orange has taken steps
> to ensure that this email and attachments are free from any virus, you
> do need to verify the possibility of their existence as Orange can
> take no responsibility for any computer virus which might be
> transferred by way of this email.
>
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
> De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is
> alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht
> ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender
> direct te informeren door het bericht te retourneren. Hoewel Orange
> maatregelen heeft genomen om virussen in deze email of attachments te
> voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn
> aangezien Orange niet aansprakelijk is voor computervirussen die
> veroorzaakt zijn door deze email.
>
> The information contained in this message may be confidential and is
> intended to be only for the addressee. Should you receive this message
> unintentionally, please do not use the contents herein and notify the
> sender immediately by return e-mail. Although Orange has taken steps
> to ensure that this email and attachments are free from any virus, you
> do need to verify the possibility of their existence as Orange can
> take no responsibility for any computer virus which might be
> transferred by way of this email.
>
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is =
alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht =
ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender =
direct te informeren door het bericht te retourneren. Hoewel Orange =
maatregelen heeft genomen om virussen in deze email of attachments te =
voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn =
aangezien Orange niet aansprakelijk is voor computervirussen die =
veroorzaakt zijn door deze email.
The information contained in this message may be confidential and is =
intended to be only for the addressee. Should you receive this message =
unintentionally, please do not use the contents herein and notify the =
sender immediately by return e-mail. Although Orange has taken steps to =
ensure that this email and attachments are free from any virus, you do =
need to verify the possibility of their existence as Orange can take no =
responsibility for any computer virus which might be transferred by way =
of this email.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
|