[Geoip-c-discuss] Performance patches for the geoip-c library
Brought to you by:
tjmather
|
From: Patrick M. <mc...@du...> - 2008-03-24 20:19:14
|
Hello - I love the MaxMind GeoIP databases! I have made a number of performance changes to the C GPL library that I am attaching to this mail for anybody that may want to use them. I'll describe the changes below, but first let me get your attention with the fabulous benchmarks ;). I did them on Linux. The mods might need a little love for non-Linux environments. The first set of benchmarks are taken using the standard benchmark program in the 1.4.4 geoip distribution doing the country lookups. The measurements are taken on a core2-duo - the benchmark runs single threaded. Line A is the standard 1.4.4 distribution for comparison, line b has my patch applied. All numbers are transactions per second (more is better). Country Lookup NO_FLAGS CHECK_CACHE MEMORY_CACHE (MEMORY_CACHE | CHECK_CACHE) A: 225,140 168,302 618,552 552,760 B: 226,918 +0% 233,735 +39% 2,112,676 +242% 5,412,988 +879% The next set is the same benchmark, but is the city db lookups instead of the country ones. Most of my testing was with the city database. NO_FLAGS INDEX_CACHE (INDEX_CACHE | CHECK_CACHE) MEMORY_CACHE A: 124,481 687,534 282,352 2,526,315 B: 128,369 +3% 724,688 +5% 653,975 +132% 6,168,270 +144% I have another benchmark I conjured up - it is derived from the standard one. Mine starts by choosing 3 million random IP addresses (still represented as text dotted decimal) and measures how long it takes to call the city api on each of them. The standard one just looks up the same 4 addresses over and over again - my test has more realistic cache behavior and fairly exercises the database. I measure the test 3 ways: (A) With the 1.4.4 GeoIP distribution, (B) against my patches using the standard city api, and finally (C) with an API I added that lets you provide the memory to put the city record into - this takes some pressure off the allocator. Measurements in TPS (more is better). % improvemts are vs (A). NO_FLAGS CHECK_CACHE MEMORY_CACHE (MEMORY_CACHE | CHECK_CACHE) A: 208,281 148,201 2,830,474 500,002 B: 210,451 +1% 203,957 +38% 4,910,321 +73% 2,944,836 +489% C: 211,968 +2% 206,008 +39% 5,732,488 +103% 3,208,800 +542% The changes come from 3 different areas. The most prominent change is the addition of a new in memory index that is created only when MEMORY_CACHE is provided. This index costs about 37 MB of extra memory - a classic time vs space tradeoff. If the maintainers want to integrate these patches into the base distribution, perhaps a new flag should be used instead of piggybacking on MEMORY_CACHE - though 37 is not an especially large number if you're willing to set MEMORY_CACHE. The new index, instead of using the radix tree layout that is included in the data file, creates a hash table of the first 17 bits of the netmask. masks shorter than 17 bits are simply hashed as their set of /17 subnets. Masks longer than /17 (which is 99.7% of the database) contain pointers to a "super index node" that works a lot like the radix tree except each node represents 3 bits. This arrangement means that you get the first 17 bits in one memory lookup (instead of 17), for a pretty reasonable memory cost - especially when you consider the low levels of the tree are pretty dense to start with. The super-index-nodes cover up to three levels of tree using 8 32 bit ints that are directly indexed from the 3 bits of interest - addresses longer than 20 bits are chained to the next super index node using a flag bit in the int. Since one node covers 3 levels of tree, the most accesses any one address should require is 6 (the hash lookup, and 5 super-index-nodes). Almost 1/3 of addresses in the DB can be looked up in 4 or less, and 88% are done in 5 or less. The radix tree required on average 27 accesses per lookup. going with an initial /16, /4, /4, etc.. approach instead of /17, /3, /3.. improves the all cache case 5-10% TPS at the cost of almost doubling the ram overhead.. so /17-3 seems to be the sweet spot. This is all transparent to the API and does not require any changes to the disk format. It is directly used by _GeoIP_seek_record() and thus optimizes the lookup of any DB.. The next largest change is to the handling of the CHECK_CACHE flag. The existing code was mtime and stat() based. If CHECK_CACHE was set the stat() was repeated on each lookup and the resulting mtime value was compared to a cached value. However, mtime has only a 1 second granularity so it doesn't make sense to do this thousands and thousands of times each second. The easy optimization here is to substitute gettimeofday() for stat() just to make sure we are on a new second before calling stat(). This is kind of a silly optimization - if you're serious about performance and want the CHECK_CACHE functionality the right thing to do would be implement inotify() elsewhere in the application and use that in an event driven style to determine when to reload the database, instead of polling with system calls. Nonetheless - this technique has a good impact on the library. The final changes are to the City-DB extract_record code. There were some double traversals of strings in there that could be optimized, and a number of string allocations could be condensed into the initial record allocation. Avoiding malloc/free at very high transaction rates is pretty important. In that vein, I also added another city lookup function that allows the caller to supply an address to put the data into - which also saves an allocation. Best Regards, Patrick |