RE: [Nagios-db-devel] performance query
Status: Beta
Brought to you by:
bench23
From: Ben <be...@si...> - 2005-04-28 17:27:57
|
On Thu, 28 Apr 2005, Dan Hopkins wrote: > > The other potential gotcha is that nagios has to write the value of > > each check into the db to complete a check, and if the db is slammed, > > this will take an increasing amount of time. This was causing me very > > bad latency problems (thousands of seconds) until recently, when I > > implemented some thread pool action in the NEB and now my > > latency times > > tend to hover around 15 seconds. > > This is the sort of issue that's hitting us: nagios is slow writing it's > updates in aggregated mode (we're talking 10-20 seconds for less than 4000 > services in worst case) that some of our scripts (inhouse php customised > replacements) just hang on queries waiting for the locks to free. And too > many users accessing the scripts cause nagios to lag obtaining locks to dump > its updates. Enter non-aggregated updates .... and the massive jump in load > on the nagios host. Still, the user facing scripts appear quicker at least > ;) But it did make me wonder how the neb's fare with thousands of status > updates, now I see from another thread you've got over 8k services on the go > ? - that's promising stuff, are you distributing this over multiple nagios > hosts or a single centralised one? I've got a single dual xeon with 2.5GB of ram running nagios and making all the active checks (I only have active checks). It does agregate writes to a ram disk for the nagios logs. I've got another dual xeon with 4GB of ram and a fast scsi raid holding the db. And I've got a third wimpy box running the php UI. My problem is the way nagios' scheduler works. I don't know if you're familiar with it, so let me just tell you how it works. Everything nagios does gets scheduled and piped through an event queue. There are low priority events (like kicking off checks at a specified time) and high priority events (like reaping outstanding check results). Well, all the high priority events get handled before any low priority ones. So what eneds up happening is that nagios starts kicking off low priority checks until the results come back - and then it spends a lot of time (10 seconds at least) processing those results. Nagios-DB only makes this worse. During those 10 seconds, no more events are kicked off, meaning things get progressively worse. > > That thread pool-enabled version of nagios still lives in CVS only; > > once I write some documentation, I'll be making another release. > > I look forward to having a play with this. Just to clarily, I meant a thread pool-enabled version of the postgres nagios-db NEB is in CVS. :) |