[Assorted-commits] SF.net SVN: assorted: [413] hash-join/trunk
Brought to you by:
yangzhang
From: <yan...@us...> - 2008-02-14 20:33:13
|
Revision: 413 http://assorted.svn.sourceforge.net/assorted/?rev=413&view=rev Author: yangzhang Date: 2008-02-14 12:33:16 -0800 (Thu, 14 Feb 2008) Log Message: ----------- added more doc and web publishing Added Paths: ----------- hash-join/trunk/doc/ hash-join/trunk/doc/Makefile hash-join/trunk/doc/analysis.txt Added: hash-join/trunk/doc/Makefile =================================================================== --- hash-join/trunk/doc/Makefile (rev 0) +++ hash-join/trunk/doc/Makefile 2008-02-14 20:33:16 UTC (rev 413) @@ -0,0 +1,23 @@ +PROJECT := hash-join +WEBDIR := assorted/htdocs/$(PROJECT) +PANDOC = pandoc -s -S --tab-stop=2 -c ../main.css -o $@ $^ + +all: index.html analysis.html + +index.html: ../README + $(PANDOC) + +analysis.html: analysis.txt + $(PANDOC) + +publish: analysis.html index.html + ssh shell-sf mkdir -p $(WEBDIR)/ + scp $^ shell-sf:$(WEBDIR)/ + +publish-data: times.pdf speedups.pdf + scp $^ shell-sf:$(WEBDIR)/ + +clean: + rm -f index.html analysis.html + +.PHONY: clean publish Added: hash-join/trunk/doc/analysis.txt =================================================================== --- hash-join/trunk/doc/analysis.txt (rev 0) +++ hash-join/trunk/doc/analysis.txt 2008-02-14 20:33:16 UTC (rev 413) @@ -0,0 +1,37 @@ +% Hash-Join Benchmarks +% Yang Zhang + +Here are the graphs from the latest experiments and implementation: + +- [times](times.pdf) +- [speedups](speedups.pdf) + +This implementation was originally not scalable in the hashtable-building +stage, which performed frequent allocations. The hashtable is stock from the +SGI/libstdc++ implementation. I removed this bottleneck by providing a custom +allocator that allocated from a non-freeing local memory arena. + +Profiling reveals that most of the time is spent in the hash functions and the +function that performs the memcpy during hash-partitioning. `actdb::partition1` +is the hash-partitioning function for actresses, and it calls `push_bucket` to +copy tuples into buckets. `scan` is just a function to touch all the data from +the file. + + % cumulative self self total + time seconds seconds calls s/call s/call name + 16.40 0.82 0.82 4547797 0.00 0.00 commons::hash_djb2(char const*) + 14.80 1.56 0.74 4547797 0.00 0.00 __gnu_cxx::__stl_hash_string(char const*) + 13.20 2.22 0.66 4547797 0.00 0.00 db::push_bucket(char**, bucket*, char const*, char const*, unsigned long) + 12.80 2.86 0.64 2 0.32 0.32 commons::scan(void const*, unsigned long) + 10.80 3.40 0.54 1 0.54 1.78 actdb::partition1(unsigned int, bucket*) + ... + +Now the hashtable construction phase is the most scalable part of the +algorithm. The remaining bottlenecks appear to be due to the memory stalls. + +The program does not scale much beyond the 16 threads, though performance does +improve slightly. This is due to the contention for cache capacity among +multiple hardware threads per core. + +This implementation is straightforward, with no fanciness in terms of custom +scheduling and control over allocation, leaving many things up to the OS. This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |