RE: [Nagios-db-devel] performance query
Status: Beta
Brought to you by:
bench23
From: Dan H. <dan...@uk...> - 2005-04-28 16:16:24
|
> -----Original Message----- > From: Ben [mailto:be...@si...] > Sent: 28 April 2005 16:09 > To: Dan Hopkins > Cc: 'nag...@li...' > Subject: Re: [Nagios-db-devel] performance query <snip> > There are two potential gotchas. First, the postgres nagios-db uses > snapshot materialized views, meaning they refresh > periodically, not in > realtime. I have pretty beefy hardware, so I can set mine to refresh > every minute, but for a tool that's supposed to tell you when things > are down, I think updating every minute is pretty overkill. > Once every > 5 minutes would probably be just fine. This is my next step too, although I'm using MySQL so pgsql's nice materialized views will either be hand rolled snapshot tables :), or I may try replication to a reader db. I'd intended to do the snapshots fairly quickly, a couple of times a minute (we're only talking a few thousand rows at the moment, so it's no big deal to the db) but I've not done any testing yet so this could be optimistic considering there are multiple nagios instances running off the same db backend hosts. We shall see... > The other potential gotcha is that nagios has to write the value of > each check into the db to complete a check, and if the db is slammed, > this will take an increasing amount of time. This was causing me very > bad latency problems (thousands of seconds) until recently, when I > implemented some thread pool action in the NEB and now my > latency times > tend to hover around 15 seconds. This is the sort of issue that's hitting us: nagios is slow writing it's updates in aggregated mode (we're talking 10-20 seconds for less than 4000 services in worst case) that some of our scripts (inhouse php customised replacements) just hang on queries waiting for the locks to free. And too many users accessing the scripts cause nagios to lag obtaining locks to dump its updates. Enter non-aggregated updates .... and the massive jump in load on the nagios host. Still, the user facing scripts appear quicker at least ;) But it did make me wonder how the neb's fare with thousands of status updates, now I see from another thread you've got over 8k services on the go ? - that's promising stuff, are you distributing this over multiple nagios hosts or a single centralised one? > That thread pool-enabled version of nagios still lives in CVS only; > once I write some documentation, I'll be making another release. I look forward to having a play with this. Thanks for the info, Dan |