Re: [Nagios-devel] Nagios and Gearman - huge environment performance problem
Nagios network monitoring software is enterprise server monitoring
Brought to you by:
egalstad,
sawolf-nagios
From: Rodney R. <rod...@gm...> - 2011-08-24 14:26:15
|
Hi Sven. Thank you again. I´m pretty sure that my check interval is 15 min, for both, hosts and services. I´ve set this in the templates.cfg file (see below). I sending too the nagiostats output. I agree with you that if we divide 100 k checks / 15 min ~ 111 checks/sec, but the problem is that Nagios does not make these checks smoothly during the time. Thats the problem. ========== templates.cfg ========== define host{ name generic-host ... check_interval 15 .... } define service{ name generic-service ... normal_check_interval 15 .... } ============== nagiostats output ============== Nagios Stats 3.2.3 Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org) Last Modified: 10-03-2010 License: GPL CURRENT STATUS DATA ------------------------------------------------------ Status File: /usr/local/nagios/var/status.dat Status File Age: 0d 0h 0m 17s Status File Version: 3.2.3 Program Running Time: 0d 17h 43m 2s Nagios PID: 18854 Used/High/Total Command Buffers: 0 / 0 / 4096 Total Services: 68206 Services Checked: 68206 Services Scheduled: 68206 Services Actively Checked: 68206 Services Passively Checked: 0 Total Service State Change: 0.000 / 43.880 / 2.774 % Active Service Latency: 40.671 / 503.137 / 234.919 sec Active Service Execution Time: 0.003 / 24.737 / 2.527 sec Active Service State Change: 0.000 / 43.880 / 2.774 % Active Services Last 1/5/15/60 min: 0 / 2897 / 35932 / 68206 Passive Service Latency: 0.000 / 0.000 / 0.000 sec Passive Service State Change: 0.000 / 0.000 / 0.000 % Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0 Services Ok/Warn/Unk/Crit: 46943 / 56 / 7660 / 13547 Services Flapping: 980 Services In Downtime: 0 Total Hosts: 34103 Hosts Checked: 34103 Hosts Scheduled: 34103 Hosts Actively Checked: 34103 Host Passively Checked: 0 Total Host State Change: 0.000 / 63.820 / 2.598 % Active Host Latency: 0.000 / 474.337 / 247.944 sec Active Host Execution Time: 0.000 / 20.354 / 2.033 sec Active Host State Change: 0.000 / 63.820 / 2.598 % Active Hosts Last 1/5/15/60 min: 0 / 5936 / 29437 / 34103 Passive Host Latency: 0.000 / 0.000 / 0.000 sec Passive Host State Change: 0.000 / 0.000 / 0.000 % Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0 Hosts Up/Down/Unreach: 23591 / 10512 / 0 Hosts Flapping: 597 Hosts In Downtime: 0 Active Host Checks Last 1/5/15 min: 3 / 89 / 209 Scheduled: 0 / 0 / 0 On-demand: 3 / 89 / 209 Parallel: 0 / 0 / 0 Serial: 0 / 0 / 0 Cached: 3 / 89 / 209 Passive Host Checks Last 1/5/15 min: 0 / 0 / 0 Active Service Checks Last 1/5/15 min: 0 / 0 / 0 Scheduled: 0 / 0 / 0 On-demand: 0 / 0 / 0 Cached: 0 / 0 / 0 Passive Service Checks Last 1/5/15 min: 0 / 0 / 0 External Commands Last 1/5/15 min: 0 / 0 / 0 On Tue, Aug 23, 2011 at 6:14 PM, Sven Nierlein <Sve...@co...>wrote: > On 8/23/11 22:21, Rodney Ramos wrote: > > When I´ve changed the max_concurrent_checks from "0" to "200", nagios > process fell down to 30/50%. However, the latency increased a lot, going to > more then 1000 sec!! > > Which means you have usually more than 200 concurrent checks. Maybe > 400-500. When i compare that to your inital mail, writing about 60k services > + 30k hosts in a 15min interval i get only 100checks / second. Are you sure > about the 15min interval? How many checks do you have per second? Did you > change you interval_length? > > Sven > > > ------------------------------------------------------------------------------ > EMC VNX: the world's simplest storage, starting under $10K > The only unified storage solution that offers unified management > Up to 160% more powerful than alternatives and 25% more efficient. > Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev > _______________________________________________ > Nagios-devel mailing list > Nag...@li... > https://lists.sourceforge.net/lists/listinfo/nagios-devel > |