|
From: Eric M. <er...@el...> - 2010-02-08 18:45:28
|
Donal K. Fellows wrote: > When I've measured the speed of the hashing, I've got these results: > > ... > > Old: 1.9507125463 microseconds per iteration > New: 1.5804703108 microseconds per iteration > > That's a consistent ~19% percent faster. > That's very impressive, but with only three hash entries you can hardly say that's a representative test case. Where's the demonstration of the hash for, eg, all of the standard commands in Tcl? How about all the words in /usr/dict/words? And of course, you've *said* that the new hash is better at avoiding collisions, but saying it don't make it so: where's your data to back up that claim? Again, it would be nice to see timing information for computing the hash (and *only* computing the hash, not also creating Tcl variables and allocating memory and running Tcl byte codes, etc, etc); and information on the distribution of the resulting hash entries (something like the Tcl hash stats data, for example), on several large real-world sample sets. I'm sure you will agree that for a change as fundamental to the core operation of Tcl, it is worth the effort to do this properly. > By comparison (the builds > I'm testing with aren't quite synchronized in all aspects) with this > code: > Wait a second: are you saying that the impressive results you just showed us are comparing two tclsh builds that have more than just the hash function changed? I think that invalidates any data you have shown. There's too many variables changed for you to convincingly state that the new hash function is performing any better than the original hash, especially when you're talking about differences of less that half a microsecond. > If anything, that would lead me to conclude that the improvement in > hashing is better than the raw figures would suggest, but I can't > prove that. > If you can't prove that the new hash is better than the old hash, then you shouldn't be making changes to it. > All the above were tested with the default optimizing build on OSX. > It would be nice to see comparisons on other platforms. According to Wikipedia, OSX represents only about 5% of the total market share of computers (http://en.wikipedia.org/wiki/Usage_share_of_operating_systems). By comparison, XP represents about 60%, and Vista about 25%. Do you have comparison data on either of those platforms? How about on any of the modern Linux distributions (Ubuntu, SUSE, etc)? > I did read that article prior to selecting the FNV algorithm. [...] the FNV hash has one huge advantage over Bob's lookup3 hash: it's actually simple to implement (and to do so correctly). Please don't interpret my comments as advocacy for Bob's lookup3 hash. I have no preference for any particular hash function. I do have concerns about changing a hash function that has worked quite well for 20 years, without sufficient consideration and experimentation to justify that change. > The changes I've made are purely internal to Tcl. No public interface is > impacted at all. > I don't believe that fact is at all relevant to the question of whether the change requires a TIP. Best regards, Eric Melski |