From: Donal K. F. <don...@ma...> - 2010-02-07 09:36:27
|
Hi everyone This is just a short note to let everyone know that I've just upgraded Tcl 8.6's string (and Tcl_Obj) hash algorithm. Specifically, I've changed from using JO's old algorithm that was approximately[*]: hash = ∑ str[i] * 9**i to the FNV algorithm <URL:http://isthe.com/chongo/tech/comp/fnv/> which is similar in many ways but uses a large prime as the power (instead of 9) and XOR as the combiner instead of addition. The result is a bit faster when I test it, and a little bit harder for code to push into situations where it performs badly. I don't know whether this is from faster computation or better distribution of the resulting hash values. How much better is the new function? I'm sorry to say that the state of research in this area is remarkably poor; very few people are looking at what makes for a good *non-cryptographic* string hashing function. About the only thing that seems to be known for sure is that the power needs to be odd so that low order bits don't get lost. But the FNV algorithm seems to be the most thoroughly understood, and there was a Public Domain implementation (which helped). The big difference is that the amount of mixing of bits is much larger than before. The main reason for this note is that it affects anyone who had code that assumed anything about hash iteration order. If you've got that assumption in your code, it's now been broken. This is more likely to impact on tests rather than production code - production code is more likely to have to deal with an expansion of the hash table which changes the order anyway, and so won't make the assumption - and the fix is to sort the values extracted from the hash before testing for equality. :-) Donal. [* The equation is true if you index from the end of the string... ] |