Re: [Linuxtuples-devel] Hashtable on the tuple server?
Brought to you by:
wware
From: Bram S. <br...@sa...> - 2006-09-15 13:55:52
|
Will Ware wrote: > Does a hashtable on the server make sense? The idea would be to > generate hashes for the first three elments of the tuple, which would > index into a cubical table of linked lists of stored tuples. When > reading or getting, depending on the number and placement of > wildcards, the cube would be searched at a point, or a line segment, > or a square. There would need to be separate hashtables for tuples > shorter than three elements. I chose three only because I can easily > envision a cube, and envisioning a tesseract takes more work. > > For very large numbers of tuples, and searches with few wildcards, > this would make reading and getting much more efficient. It would be a > fair amount of work to implement, but the payoff might justify it. In theory, yes, hashtables give you O(1) lookup. We now have O(n) lookup. I don't think that a 3D map would be req'd. Just hash all elts to a single hash value, and use STL's std::hash_map<> In practice, you have to be very careful... Only for very large N, does hashing make sense. std::map<> takes e.g. O(logN), and is likely to be just as fast. For me, N is low enough not to notice perf gains with binary or hash lookups. Before you undertake a major effort, I suggest you do a small test on std::list, std::map and std::hash_map, to see at what N you will reap performance benefits. STL has built in hashing funcs for strings, and probably floats and ints as well, so they match your linuxtuples primitives. You would need to do the hashing recursively for nested tuples, else your hashing distribution goes wrong. My gut feeling is that it's not worth the effort. But then again, I don't know what you use the tspace for. If you have 100M tuples in it, I would pursue it. Bram > > The question of whether the hashtable should be linear, square, > cubical, etc. would depend on the average or expected number of > wildcards in the first few elements of a template, and there would be > a tradeoff with hashtable memory size on the server. If the first two > elements are rarely wildcards, but the third is usually a wildcard, > then a cube offers no advantage over a square. > > Bram, AFAIK you are currently the heaviest user of LinuxTuples on the > planet. Do you have a sense for the distribution of wildcards in the > first few elements? Would that be an easy statistic to collect while > your system is running? Any thoughts on how many bits a hash value > should have? > > Will > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Linuxtuples-devel mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxtuples-devel |