[Linuxtuples-devel] Hashtable on the tuple server?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Does a hashtable on the server make sense? The idea would be to
generate hashes for the first three elments of the tuple, which would
index into a cubical table of linked lists of stored tuples. When
reading or getting, depending on the number and placement of
wildcards, the cube would be searched at a point, or a line segment,
or a square. There would need to be separate hashtables for tuples
shorter than three elements. I chose three only because I can easily
envision a cube, and envisioning a tesseract takes more work.

For very large numbers of tuples, and searches with few wildcards,
this would make reading and getting much more efficient. It would be a
fair amount of work to implement, but the payoff might justify it.

The question of whether the hashtable should be linear, square,
cubical, etc. would depend on the average or expected number of
wildcards in the first few elements of a template, and there would be
a tradeoff with hashtable memory size on the server. If the first two
elements are rarely wildcards, but the third is usually a wildcard,
then a cube offers no advantage over a square.

Bram, AFAIK you are currently the heaviest user of LinuxTuples on the
planet. Do you have a sense for the distribution of wildcards in the
first few elements? Would that be an easy statistic to collect while
your system is running? Any thoughts on how many bits a hash value
should have?

Will