Re: [Nagios-devel] [PATCH] circular dependency check speedup
Nagios network monitoring software is enterprise server monitoring
Brought to you by:
egalstad,
sawolf-nagios
From: Andreas E. <ae...@op...> - 2010-12-16 11:23:46
|
I'd appreciate if you could send new threads as new messages instead of as replies to current threads. Especially when you're sending patches. If I hadn't been interested in the topic listed in this thread, I'd have missed this patch and it had been dropped on the floor (unless Ton had applied it, in which case I would have reverted parts of it quite promptly). I've done some organoleptic analysis of the patch (looked at it). Analysis included below. On 12/15/2010 06:38 PM, Glenn Herteg wrote: > Nagios has historically used a very inefficient algorithm to check > for circular host and service dependencies, with respect to both > execution and notification dependencies. This area has been the > subject of previous patches to improve performance. For example, > in early 2008, a depth-first-search algorithm was proposed for > part of this checking, and it was eventually incorporated into > the standard code base. However, that only affected part of the > circular-dependency-check algorithm, and performance against tens > of thousands of dependencies has still been extremely slow (taking > tens of minutes for a very large configuration). > Neat. So far so good :) > GroundWork Open Source has looked at this problem with an eye > toward full optimization, and hereby provides a patch to fix > the remaining sources of poor performance. The changes here are > not just theoretical; they have been tested against a very large > customer configuration, both with and without seeding a number of > actual circular dependencies, to demonstrate that the same errors > are caught as with the original code. If you could make the test configs available online, that'd be great. I'd like to test this myself. > The replacement algorithm in > this patch provides a speedup of around 20,000x for that portion of > a pre-flight or reload operation, for a configuration with a very > large number of host or service dependencies. This translates to > an overall speedup of about 275x with our large test dataset, for > a complete preflight operation. In simpler terms, many minutes of > computation dissolve away into just a few seconds. > I'd like numbers, along with architectures this was tested on. Also the test-configs that were used to get these numbers. Take a look at a6e06d8de24ffcb4a8c341a60098042dc3284756 to get an idea of how I like to see performance numbers. The repo where a6e06d8de24ff is available can be cloned from git://git.op5.org/nagios.git > Given the timing improvements just noted, it is GroundWork's belief > that application of this patch will completely obsolete the nagios > -x and --dont-verify-paths ("Don't check for circular object paths") > options. > If what you're claiming is true, they'll be deprecated and will do nothing, but will remain valid options until Nagios 4.x hits the market. > The replacement algorithm works in three ways: by using substitute > data structures which can be operated on much more efficiently, by > sidestepping huge amounts of pointless work, and by optimizing the > placement of fields within certain existing data structures so that > more efficient assembly code is generated for the most commonly used > field accesses. All of the modifications have been carefully tested > using actual measurements to show that they do improve execution > time, so the field-placement changes (for instance) are not based > on capricious guesswork. > Hmm. Having taken a look at the field restructures, I'll have to say that your claim is completely bogus for any architecture where sizeof(int) == sizeof(void *) which holds true for most architectures not running in compatibility mode. This is ofcourse only true if the compiler is clever enough to calculate the real address of the desired variable rather than doing the memory fetch in two steps, which has been the case for gcc since 1.x sometime. If sizeof(int) != sizeof(void *) and the architecture brings penalties when doing unaligned memory access, the structure reordering makes sense, but it would make *more* sense to utilize a padding scheme for the int things to make sure they're always the same size as a pointer, and to use a sensible compiler that always does direct offset access. Either way, I've declared anathema on structure reordering patches until 3.3 since they break the NEB ABI, so you'll have to reroll the patch anyways. > To avoid impacting the rest of the Nagios code outside of the > circular dependency checks, the original dependency list structures > are restored after the replacement algorithm has done its checking. > We have not investigated whether keeping the replacement data > structures around for later use instead of the original dependency > lists might afford additional operating speedup, though that is > perhaps a possibility. > > As provided, the attached patch file is suitable for direct > application to the Nagios 3.2.3 source code. > Perhaps, but since some changes are not directly related and of dubious value, I'd like this patch split into several parts which can be applied and tested separately. First the patch that alters the algorithm for searching through dependencies, but using a generic binary search algorithm instead of the "specialty" ones which, afaiu after a quick glance understand it are special only because they lack checking for NULL being passed as an argument. That's a microscopic gain in the larger scheme of things, but adds enormously to the maintenance burden, so it needs to be tested separately. If this is what provides the bulk of the speedup, your algorithm is poorly implemented. Then the patch that re-orders the object structures. I suspect this patch has microscopic impact as well and, like I said, I can't take it now anyways. This should be testable for performance both with and without patch nr 1. The third patch should be the one creating all the specialty binary search algorithm functions (code duplication galore). With all the patches you're serious about getting into the Nagios core, I want performance numbers measuring hot and cold cache cases before and after. Do echo 3 > /proc/sys/vm/drop_caches as root and then run /usr/bin/time nagios -p /path/to/nagios.cfg twice to measure the real difference in startup time. The first one will be with cold cache, the second with hot cache. I expect the algorithm change to provide a vast improvement (as it did for host->parent relationships), with the other patches to be relatively minor, so the only patch I expect to apply is actually the first one. Thanks :) -- Andreas Ericsson and...@op... OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. |