[Assorted-commits] SF.net SVN: assorted: [413] hash-join/trunk

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Revision: 413
          http://assorted.svn.sourceforge.net/assorted/?rev=413&view=rev
Author:   yangzhang
Date:     2008-02-14 12:33:16 -0800 (Thu, 14 Feb 2008)

Log Message:
-----------
added more doc and web publishing

Added Paths:
-----------
    hash-join/trunk/doc/
    hash-join/trunk/doc/Makefile
    hash-join/trunk/doc/analysis.txt

Added: hash-join/trunk/doc/Makefile
===================================================================

--- hash-join/trunk/doc/Makefile	                        (rev 0)
+++ hash-join/trunk/doc/Makefile	2008-02-14 20:33:16 UTC (rev 413)
@@ -0,0 +1,23 @@
+PROJECT := hash-join
+WEBDIR  := assorted/htdocs/$(PROJECT)
+PANDOC   = pandoc -s -S --tab-stop=2 -c ../main.css -o $@ $^
+
+all: index.html analysis.html
+
+index.html: ../README
+	$(PANDOC)
+
+analysis.html: analysis.txt
+	$(PANDOC)
+
+publish: analysis.html index.html
+	ssh shell-sf mkdir -p $(WEBDIR)/
+	scp $^ shell-sf:$(WEBDIR)/
+
+publish-data: times.pdf speedups.pdf
+	scp $^ shell-sf:$(WEBDIR)/
+
+clean:
+	rm -f index.html analysis.html
+
+.PHONY: clean publish

Added: hash-join/trunk/doc/analysis.txt
===================================================================
--- hash-join/trunk/doc/analysis.txt	                        (rev 0)
+++ hash-join/trunk/doc/analysis.txt	2008-02-14 20:33:16 UTC (rev 413)
@@ -0,0 +1,37 @@
+% Hash-Join Benchmarks
+% Yang Zhang
+
+Here are the graphs from the latest experiments and implementation:
+
+- [times](times.pdf)
+- [speedups](speedups.pdf)
+
+This implementation was originally not scalable in the hashtable-building
+stage, which performed frequent allocations. The hashtable is stock from the
+SGI/libstdc++ implementation. I removed this bottleneck by providing a custom
+allocator that allocated from a non-freeing local memory arena.
+
+Profiling reveals that most of the time is spent in the hash functions and the
+function that performs the memcpy during hash-partitioning. `actdb::partition1`
+is the hash-partitioning function for actresses, and it calls `push_bucket` to
+copy tuples into buckets. `scan` is just a function to touch all the data from
+the file.
+
+   %   cumulative   self              self     total
+  time   seconds   seconds    calls   s/call   s/call  name
+  16.40      0.82     0.82  4547797     0.00     0.00  commons::hash_djb2(char const*)
+  14.80      1.56     0.74  4547797     0.00     0.00  __gnu_cxx::__stl_hash_string(char const*)
+  13.20      2.22     0.66  4547797     0.00     0.00  db::push_bucket(char**, bucket*, char const*, char const*, unsigned long)
+  12.80      2.86     0.64        2     0.32     0.32  commons::scan(void const*, unsigned long)
+  10.80      3.40     0.54        1     0.54     1.78  actdb::partition1(unsigned int, bucket*)
+  ...
+
+Now the hashtable construction phase is the most scalable part of the
+algorithm. The remaining bottlenecks appear to be due to the memory stalls.
+
+The program does not scale much beyond the 16 threads, though performance does
+improve slightly. This is due to the contention for cache capacity among
+multiple hardware threads per core.
+
+This implementation is straightforward, with no fanciness in terms of custom
+scheduling and control over allocation, leaving many things up to the OS.


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.