From: Tom J. <tom...@gm...> - 2010-02-16 18:31:04
|
Gustaf, Try out my Tcl hash functions in my previous email. The problem is not the hash function, it is generally the method of transforming the hash value into a bucket index. The better performing hashes shown in your post are the result of a lot of randomness in the low order bits. This is to be expected of a hash which mixes all the bits in with the new bits. If the entire hash value was used, it could make a lot of sense to increase the sophistication of the hash function, but it would be much easier to just take a prime modulus of hash value just once. This would take into account all the bits. This would help all simple hash functions. For instance, here are the results with a patched tcl8.6 which shows how badly the bucket choose can sometimes be compared to using a modulus reduction: note collide is as follows: proc collide {level {accum ""}} { global a if {$level} { incr level -1 collide $level "aj6$accum" collide $level "bap$accum" } else { set a($accum) 1 } } Note to Donal: I know the collide proc doesn't collide, but it generates equal length keys useful for testing. % source hash2.tcl 595596 microseconds per iteration 32768 entries in table, 16384 buckets number of buckets with 0 entries: 12289 number of buckets with 1 entries: 11 number of buckets with 2 entries: 36 number of buckets with 3 entries: 124 number of buckets with 4 entries: 242 number of buckets with 5 entries: 385 number of buckets with 6 entries: 473 number of buckets with 7 entries: 586 number of buckets with 8 entries: 580 number of buckets with 9 entries: 504 number of buckets with 10 entries: 408 number of buckets with 11 entries: 288 number of buckets with 12 entries: 204 number of buckets with 13 entries: 107 number of buckets with 14 entries: 70 number of buckets with 15 entries: 34 number of buckets with 16 entries: 19 number of buckets with 17 entries: 13 number of buckets with 18 entries: 8 number of buckets with 19 entries: 2 number of buckets with 20 entries: 1 number of buckets with 10000 or more entries: 0 average search distance for entry: 5.0 (obvious the average distance is more than 5.0, more like 10) Using modulus 16567 to choose a bucket: Modulus 16567 (hash%modulus = bucket) Bucket Fill hashmod1 DJB hash from cdb buckets with 1 = 4612 total 4612 buckets with 2 = 4489 total 13590 buckets with 3 = 2925 total 22365 buckets with 4 = 1439 total 28121 buckets with 5 = 582 total 31031 buckets with 6 = 206 total 32267 buckets with 7 = 60 total 32687 buckets with 8 = 9 total 32759 buckets with 9 = 1 total 32768 buckets with 0 = 2244 Bucket Fill hashmod2 Current Tcl hash buckets with 1 = 4514 total 4514 buckets with 2 = 4432 total 13378 buckets with 3 = 2964 total 22270 buckets with 4 = 1469 total 28146 buckets with 5 = 570 total 30996 buckets with 6 = 214 total 32280 buckets with 7 = 52 total 32644 buckets with 8 = 12 total 32740 buckets with 9 = 2 total 32758 buckets with 10 = 1 total 32768 buckets with 0 = 2337 Bucket Fill hashmod3 FNV Hash buckets with 1 = 4558 total 4558 buckets with 2 = 4504 total 13566 buckets with 3 = 2955 total 22431 buckets with 4 = 1418 total 28103 buckets with 5 = 610 total 31153 buckets with 6 = 181 total 32239 buckets with 7 = 55 total 32624 buckets with 8 = 9 total 32696 buckets with 9 = 8 total 32768 buckets with 0 = 2269 The current Tcl hash does better (internally) than FNV: 520529 microseconds per iteration 327520529 microseconds per iteration 32768 entries in table, 16384 buckets number of buckets with 0 entries: 8448 number of buckets with 1 entries: 928 number of buckets with 2 entries: 1296 number of buckets with 3 entries: 1348 number of buckets with 4 entries: 1176 number of buckets with 5 entries: 1024 number of buckets with 6 entries: 910 number of buckets with 7 entries: 576 number of buckets with 8 entries: 380 number of buckets with 9 entries: 196 number of buckets with 10 or more entries: 102 average search distance for entry: 3.2 tom jackson |