Re: [Linuxtuples-devel] Hashtable on the tuple server?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Will Ware wrote:
> Does a hashtable on the server make sense? The idea would be to
> generate hashes for the first three elments of the tuple, which would
> index into a cubical table of linked lists of stored tuples. When
> reading or getting, depending on the number and placement of
> wildcards, the cube would be searched at a point, or a line segment,
> or a square. There would need to be separate hashtables for tuples
> shorter than three elements. I chose three only because I can easily
> envision a cube, and envisioning a tesseract takes more work.
> 
> For very large numbers of tuples, and searches with few wildcards,
> this would make reading and getting much more efficient. It would be a
> fair amount of work to implement, but the payoff might justify it.

In theory, yes, hashtables give you O(1) lookup.
We now have O(n) lookup.

I don't think that a 3D map would be req'd.
Just hash all elts to a single hash value, and use STL's
std::hash_map<>

In practice, you have to be very careful...
Only for very large N, does hashing make sense.

std::map<> takes e.g. O(logN), and is likely to be just as fast.

For me, N is low enough not to notice perf gains with binary
or hash lookups.

Before you undertake a major effort, I suggest you do a
small test on std::list, std::map and std::hash_map, to see
at what N you will reap performance benefits.

STL has built in hashing funcs for strings, and probably
floats and ints as well, so they match your linuxtuples primitives.
You would need to do the hashing recursively for nested
tuples, else your hashing distribution goes wrong.

My gut feeling is that it's not worth the effort. But then again,
I don't know what you use the tspace for. If you have 100M
tuples in it, I would pursue it.

   Bram

> 
> The question of whether the hashtable should be linear, square,
> cubical, etc. would depend on the average or expected number of
> wildcards in the first few elements of a template, and there would be
> a tradeoff with hashtable memory size on the server. If the first two
> elements are rarely wildcards, but the third is usually a wildcard,
> then a cube offers no advantage over a square.
> 
> Bram, AFAIK you are currently the heaviest user of LinuxTuples on the
> planet. Do you have a sense for the distribution of wildcards in the
> first few elements? Would that be an easy statistic to collect while
> your system is running? Any thoughts on how many bits a hash value
> should have?

> 
> Will
> 
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Linuxtuples-devel mailing list
> Lin...@li...
> https://lists.sourceforge.net/lists/listinfo/linuxtuples-devel