From: Donal K. F. <don...@ma...> - 2010-02-08 21:56:18
|
On 08/02/2010 19:56, Joe English wrote: > Eric Melski wrote: >> data to support that conclusion? What are the specific scenarios in >> which the existing hash is inadequate? > > One area that I'm aware of: Tcl's traditional hash function > is more prone to collisions (and it's trivially easy to generate > an arbitrary number of them). That was what prompted me to act. It's been nagging at the back of my mind for a good few years now too, ever since the thread that started with http://groups.google.co.uk/group/comp.lang.tcl/msg/5d6888cd479cf583 way back in 2003. It doesn't *always* take me 6.5 years to act on something. (NB: goal is to make sure that it is unlikely that we end up in an pathological case, which isn't quite the same thing as ensuring a minimal chain length is obtained in the normal case.) > The FNV hash is one of three (that I'm aware of) that AFAIK > are considered among the state-of-the-art. (The other two > I know of are Jenkins' hash, which you mentioned, and > MurmurHash.) Also, of the three you've just mentioned, only FNV is a simple slot-in replacement for the old code. In particular, it's easy to compute over C strings because it is strictly byte-at-a-time. Both Jenkins's and MurmurHash really want to know the length of the data being hashed first (which lets them be a bit faster, which in turn compensates for the fact that they are much more complex; byte-at-a-time forms for these would be excessively complex). The other key point is that I've explicitly rejected taking any steps that involved a change to *any* public structure (i.e., Tcl_HashTable and Tcl_HashEntry; there are macros that point into these so we can't change things around radically). The only thing that changed was the hash algorithm. (We could do better if we had more freedom to change things, but we don't. Breaking either binary compatibility or source compatibility is not an option.) Donal. |