[Assorted-commits] SF.net SVN: assorted: [379] hash-join/trunk/README
Brought to you by:
yangzhang
From: <yan...@us...> - 2008-02-11 23:19:43
|
Revision: 379 http://assorted.svn.sourceforge.net/assorted/?rev=379&view=rev Author: yangzhang Date: 2008-02-11 15:19:42 -0800 (Mon, 11 Feb 2008) Log Message: ----------- more informative readme Modified Paths: -------------- hash-join/trunk/README Modified: hash-join/trunk/README =================================================================== --- hash-join/trunk/README 2008-02-11 23:11:38 UTC (rev 378) +++ hash-join/trunk/README 2008-02-11 23:19:42 UTC (rev 379) @@ -1,3 +1,9 @@ +% Parallel Hash Join +% Yang Zhang + +Overview +-------- + This is a simple implementation of parallel hash joins. I'm using this as a first step in studying the performance problems in multicore systems programming. This implementation is tailored for a particular dataset, the IMDB @@ -3,11 +9,31 @@ `movies.list` and `actresses.list` files, which may be found [here]. -The `tools/` directory contains `DbPrep.scala`, which is a filter for the -`.list` files to prepare them to be more easily parsed by the hash join -application. +Requirements +------------ -The `tools/` directory also contains `LogProc.scala`, which processes stdout -concatenated from multiple runs of the program. This will produce the time and -speedup plots illustrating the scalability of the system. +- [C++ Commons] svn r370+ +- [libstdc++] v4.1 +Supporting Tools +---------------- + +`DbPrep` filters the `.list` files to prepare them to be parsed by the hash +join. + +`LogProc` processes stdout concatenated from multiple runs of the program. This +will produce the time and speedup plots illustrating the scalability of the +system. This has actually been made into a generic tool and will be moved to +its own project directory later. + +`Titles` extracts the titles from the output of `DbPrep` on `movies.list`. + +Related +------- + +I used [HashDist] to experiment with the chaining of various hash functions on +this dataset and observe the distribution. + [here]: http://us.imdb.com/interfaces#plain +[libstdc++]: http://gcc.gnu.org/libstdc++/ +[C++ Commons]: http://assorted.sf.net/ +[HashDist]: http://assorted.sf.net/ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |