From: Frédéric B. <fre...@fr...> - 2010-04-02 20:06:04
|
Tom Jackson wrote: > If anyone really wants to replace the hash function, investigate > moving to crit-bit tries. For Tcl it works great: you have to store > the key anyway. Crit-bit compresses leading bits, such as namespaces > into a single node. No hashing, no slow operations, perfect for tcl. Interesting. I've studied several alternative data structures for my ongoing work on Colibri, and to implement map/dict like structures I hesitate between hash tables and crit-bit trees AKA radix trees AKA Patricia tries. The latter are especially suited for maps with (relatively) short string keys (the typical Tcl usage pattern), as they allow for fast lookup, lexical ordering, prefix search (options, namespace...), etc. However in the more general usage such as numerical keys, hash tables may still be a better choice for the simple reason that Patricia tries were designed for variable-length "string" keys (in the wide sense, this includes bit strings and bigints). The question remains open for maps that expect very large string keys. Patricia tries are very good at insertion or "negated" lookup (telling whether a key is NOT present). But if the key is already present, looking up the correct node is O(logn), however the terminal strings must still be compared in their entirety to rule out false positives, so you get O(n) in worst cases. OTOH hash tables can lookup strings of arbitrary lengths (except when there are collisions) very quicky in O(logn) but one still has to compute the sting key's hash value upfront, which gives O(n) anyway unless the hash value is cached (typically using Tcl_Obj's internal rep in but subject to shimmering). I'm afraid only thorough benchmarking could sort them out in the above case. But I believe that the typical usage pattern is short string keys anyway, as the multi-megabyte-key situation is very unlikely and would require custom approaches anyway. The best design may lie in between, i.e. a hash/Patricia hybrid design with a high level interface to abstract things away. |