[Assorted-commits] SF.net SVN: assorted: [415] hash-join/trunk
Brought to you by:
yangzhang
From: <yan...@us...> - 2008-02-15 01:40:11
|
Revision: 415 http://assorted.svn.sourceforge.net/assorted/?rev=415&view=rev Author: yangzhang Date: 2008-02-14 17:40:15 -0800 (Thu, 14 Feb 2008) Log Message: ----------- updated analysis, readme, doc publishing Modified Paths: -------------- hash-join/trunk/README hash-join/trunk/doc/Makefile hash-join/trunk/doc/analysis.txt Modified: hash-join/trunk/README =================================================================== --- hash-join/trunk/README 2008-02-14 20:33:35 UTC (rev 414) +++ hash-join/trunk/README 2008-02-15 01:40:15 UTC (rev 415) @@ -29,6 +29,9 @@ there is a match, then emit the resulting joined tuple (movie title, movie release year, actress name). +Results +------- + Here are some [results]. Requirements @@ -75,7 +78,7 @@ this dataset and to observe the resulting distributions. [C++ Commons]: http://assorted.sf.net/cpp-commons/ -[HashDist]: http://assorted.sf.net/ +[HashDist]: http://assorted.svn.sourceforge.net/viewvc/assorted/hash-dist/trunk/ [Multiprocessor Hash-Based Join Algorithms]: http://citeseer.ist.psu.edu/50143.html [Scala Commons]: http://assorted.sf.net/scala-commons/ [g++]: http://gcc.gnu.org/ Modified: hash-join/trunk/doc/Makefile =================================================================== --- hash-join/trunk/doc/Makefile 2008-02-14 20:33:35 UTC (rev 414) +++ hash-join/trunk/doc/Makefile 2008-02-15 01:40:15 UTC (rev 415) @@ -1,6 +1,7 @@ -PROJECT := hash-join -WEBDIR := assorted/htdocs/$(PROJECT) -PANDOC = pandoc -s -S --tab-stop=2 -c ../main.css -o $@ $^ +PROJECT := hash-join +WEBDIR := assorted/htdocs/$(PROJECT) +HTMLFRAG := ../../../assorted-site/trunk +PANDOC = pandoc -s -S --tab-stop=2 -c ../main.css -H $(HTMLFRAG)/header.html -A $(HTMLFRAG)/google-footer.html -o $@ $^ all: index.html analysis.html @@ -14,10 +15,10 @@ ssh shell-sf mkdir -p $(WEBDIR)/ scp $^ shell-sf:$(WEBDIR)/ -publish-data: times.pdf speedups.pdf +publish-data: ../tools/data/*.pdf scp $^ shell-sf:$(WEBDIR)/ clean: rm -f index.html analysis.html -.PHONY: clean publish +.PHONY: clean publish publish-data Modified: hash-join/trunk/doc/analysis.txt =================================================================== --- hash-join/trunk/doc/analysis.txt 2008-02-14 20:33:35 UTC (rev 414) +++ hash-join/trunk/doc/analysis.txt 2008-02-15 01:40:15 UTC (rev 415) @@ -1,4 +1,4 @@ -% Hash-Join Benchmarks +% Hash-Join Analysis % Yang Zhang Here are the graphs from the latest experiments and implementation: @@ -9,7 +9,7 @@ This implementation was originally not scalable in the hashtable-building stage, which performed frequent allocations. The hashtable is stock from the SGI/libstdc++ implementation. I removed this bottleneck by providing a custom -allocator that allocated from a non-freeing local memory arena. +allocator that allocates from a non-freeing local memory arena. Profiling reveals that most of the time is spent in the hash functions and the function that performs the memcpy during hash-partitioning. `actdb::partition1` @@ -27,11 +27,14 @@ ... Now the hashtable construction phase is the most scalable part of the -algorithm. The remaining bottlenecks appear to be due to the memory stalls. +algorithm (despite its random access nature). The remaining bottlenecks appear +to be due to memory stalls, but these are mostly masked by hardware +prefetching. -The program does not scale much beyond the 16 threads, though performance does -improve slightly. This is due to the contention for cache capacity among -multiple hardware threads per core. +The program does not scale much beyond 16 threads, though performance does +improve slightly. The inability to scale beyond 16 is most likely due to the +contention for cache capacity among multiple hardware threads per core. -This implementation is straightforward, with no fanciness in terms of custom -scheduling and control over allocation, leaving many things up to the OS. +I've tried to keep the implementation simple, with no fanciness in terms of +custom task scheduling or control over allocation, leaving many things up to +the OS. This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |