From: Andreas E. <ae...@op...> - 2011-12-07 09:49:54
|
On 12/06/2011 11:41 PM, Michel Belleau wrote: > Hi. > > I recently had the chance to look at version 2.9 of NSCA. We were > originally running it in "--daemon" mode at our installation, but it > looks like the daemon mode is not bounded by any means; NSCA can > fork() as much as it wants (up to the socket listener limit, but that > is still quite a bit of processes) to process the incoming check > results and I didn't like that. I had a look at the "--single" mode > of operation and from our tests results, it doesn't scale as much as > we need. > > I went in and modified the code a bit to implement a PREFORK mode > where the NSCA daemon forks a number of processes at startup and > respawns them if they exit for some errors. In my opinion, this > should have better scalability than the single-threaded mode and > better resources usage behavior when handling many messages per > second. This is imitating the mpm_prefork worker for "httpd" a bit > (much more simplistic though). This adds a new "--prefork" > command-line option to NSCA. > > I also think that the new "check_result_path" configuration directive > is a good performance shortcut, so that is with what I tested it and > it gave good results for now. > > Any comments are welcome, if you want to include the patch upstream, > feel free as I would be glad to have contributed to that project. The > included patch applies clean on nsca-trunk; revision 1846. > Why not use multiplexing? A single process can easily handle 20k simultaneous connections that way, and it would make it easier to rewrite nsca to use the up-and-coming unix-socket input method to Nagios (persistent connection) instead of the current pipe method (which needs to be set up over and over again). Or, as Daniel says, use xinetd to limit connections. I for one am quite curious as to what happens when the connection queue gets full with this patch and inbound connectors are unable to connect anymore. The reason you're getting bazillions of procs is that that many events are coming in, so if you're "fixing" the problem by speeding up handling a little and limiting the number a lot you're going about it the wrong way. In short; Have you tested this with some really serious connection spamming, like 100 servers trying to connect and submit checkresults as quickly as they possibly can? That's the sort of load you kinda have to handle for this to work for users with very large networks. The small-network users don't have this problem, so unless it works in the super-large scenario I'm afraid this is just code-churn with no real benefit. -- Andreas Ericsson and...@op... OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. |