[TCLCORE] Upgrade of Tcl's String Hash Algorithm

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi everyone

This is just a short note to let everyone know that I've just upgraded
Tcl 8.6's string (and Tcl_Obj) hash algorithm. Specifically, I've
changed from using JO's old algorithm that was approximately[*]:

   hash = ∑ str[i] * 9**i

to the FNV algorithm <URL:http://isthe.com/chongo/tech/comp/fnv/> which
is similar in many ways but uses a large prime as the power (instead of
9) and XOR as the combiner instead of addition. The result is a bit
faster when I test it, and a little bit harder for code to push into
situations where it performs badly. I don't know whether this is from
faster computation or better distribution of the resulting hash values.

How much better is the new function? I'm sorry to say that the state of
research in this area is remarkably poor; very few people are looking at
what makes for a good *non-cryptographic* string hashing function. About
the only thing that seems to be known for sure is that the power needs
to be odd so that low order bits don't get lost. But the FNV algorithm
seems to be the most thoroughly understood, and there was a Public
Domain implementation (which helped). The big difference is that the
amount of mixing of bits is much larger than before.

The main reason for this note is that it affects anyone who had code
that assumed anything about hash iteration order. If you've got that
assumption in your code, it's now been broken. This is more likely to
impact on tests rather than production code - production code is more
likely to have to deal with an expansion of the hash table which changes
the order anyway, and so won't make the assumption - and the fix is to
sort the values extracted from the hash before testing for equality. :-)

Donal.
[* The equation is true if you index from the end of the string... ]

[TCLCORE] Upgrade of Tcl's String Hash Algorithm

The Tool Command Language implementation

[TCLCORE] Upgrade of Tcl's String Hash Algorithm