Menu

Tree [r2] /
 History

HTTPS access


File Date Author Commit
 benchmarks 2009-10-11 apapag [r2]
 data 2009-10-10 apapag [r1]
 doc 2009-10-10 apapag [r1]
 include 2009-10-10 apapag [r1]
 lib 2009-10-10 apapag [r1]
 src 2009-10-10 apapag [r1]
 tests 2009-10-10 apapag [r1]
 Makefile 2009-10-10 apapag [r1]
 README 2009-10-10 apapag [r1]

Read Me

MapReduce for the Cell B.E. Architecture
by Marc de Kruijf
dekruijf@cs.wisc.edu

-------------------------------------------------------------------------------

Contents:
---------

0. Documentation
1. Building MapReduce
2. MapReduce Build Configurations
3. MapReduce Testing
4. MapReduce Performance Data
5. Developing MapReduce Applications

-------------------------------------------------------------------------------

0. Documentation
----------------------

The MapReduce API is documented in the files include/mapReduce.h for PPE code
and include/mapReduce_spu.h for SPE code.  A rigorous design discussion and
evaluation of the runtime can be found in a published technical report included
in this distribution under the doc/ directory.

f you use this work in your own work, we would appreciate letting us know. If
you want to cite MapReduce for Cell in your research writings, please refer to
the paper “MapReduce for the Cell B.E. Architecture” by M. de Kruijf and K.
Sankaralingam, University of Wisconsin Computer Sciences Technical Report
CS-TR-2007-1625, October 2007.

-------------------------------------------------------------------------------

1. Building MapReduce:
----------------------

MapReduce on Cell is built by typing 'make' at the command prompt.  The
following two environment variables must be set for the build to complete
successfully:

CELL_TOP -
  The SDK root directory -- typically /opt/ibm/cell-sdk/prototype

MAP_REDUCE_TOP -
  The MapReduce root directory.  It is the directory in which this
  README file is located.

Also make sure that the Cell SDK binaries are in your path.  MapReduce builds
SPE code using the SPU XLC compiler by default.  To use the GCC compiler, modify
the makefile in the root directory accordingly.

MapReduce on Cell has been successfully built using Cell SDK version 2.1 running
on a Pentium 4 with Fedora Core 5.  Although the runtime has been verified to
run successfully on the Cell Simulator, virtually all of the runtime testing and
development was performed on a 3.2 Ghz Cell Blade scheduled via IBM
PartnerWorld's Virtual Loaner Program.

-------------------------------------------------------------------------------

2. MapReduce Build Configurations:
----------------------------------

There are four additional, optional environment variables that can be set to
influence the type of build that will result:

ASSERT -
  If this variable is set, mapReduce will build with asserts enabled.  If an
  assert fails during execution of the runtime, a message will be displayed on
  stderr.

LOG -
  If this variable is set, mapReduce will build with event logging enabled.
  During execution the runtime will output an trace of the execution flow
  to stdout.

DEBUG -
  If this variable is set, mapReduce will build with no optimizations and
  debugging information.  This configuration is for use in combination with gdb.

CLOCK - 
  If this variable is set, mapReduce will gather performance statistics during
  execution and output verbose timing information upon completion.  The output
  will appear on stdout.
  
Note that it is not strictly necessary to set these environment variables in
advance.  For example, the parameters can be set when invoking make as follows:

'$ make DEBUG=1 LOG=1 ASSERT=1'

IMPORTANT:  Please note that building a special configuration using these
variables will only work if make is invoked in the MapReduce root directory; the
makefiles in subdirectories do not look for these environment variables.  Also
note that, to build mapReduce using a different configuration after having
already built mapReduce, type 'make clean' to first clean the previous build.
If this is not done, there is a strong possibility that the build will not
complete as desired. 

-------------------------------------------------------------------------------

3. MapReduce Testing:
---------------------

Only one test is included in this mapReduce package.  It is located under
test/base. In the MapReduce root directory there is a file called runTests.sh.
This file is a Bash script that runs the test for increasing input sizes.  The
test will simply continue running larger and larger dataset sizes until it
fails.  The test verifies that the runtime completes without error for a
particular application where the expected output is known in advance.

-------------------------------------------------------------------------------

4. MapReduce Performance Data:
------------------------------

Several benchmarks are provided in the mapReduce package. For the details of
each, please see the accompanying technical report.  Two benchmarks not included
in the tech report are:

benchmarks/gridInterpolate - In this application, a file containing voxel data
  for a 3D space is interpolated based on a given isovalue.  This is the first
  step in the Marching Cubes visualization algorithm.  The runtime takes as
  input a 3D grid of points and an associated value for each point.  The map
  phase checks the value at each point against the isovalue and, for each edge
  to which the point belongs, outputs the point and whether it is above or
  below the isovalue.  The reduce function takes as input an edge key along with
  two points for that edge and interpolates the edge if the two points have
  values on opposite sides of the isovalue.  (This application is not known to
  perform well using MapReduce versus its single-threaded counterpart.)
  
benchmarks/sqrtCount - In this application, a randomly generated set of values
  is passed into the MapReduce runtime.  The map function takes the square
  root of each value and emits the integer truncation of the square root along
  with the value "1".  The reduce function sums up the number of values
  associated with a given key, in effect counting the number of values that have
  the same number as their truncated square root value.  This application,
  while having no real-world usefulness of its own, is representative of
  counting applications such as counting the frequency of occurences of a word
  in a text file, or counting the frequency of a given pixel color in a bitmap
  file.
  
Each benchmark takes the number of SPEs and the number of partitions (hash
buckets) as an execution parameter.  gridInterpolate also takes as input a
data file and an isovalue.  Single-threaded versions of each benchmark are
also available under the serial/ directory of each benchmark.  The
single-threaded applications were run and tested on an Intel Core 2 Duo
processor.

-------------------------------------------------------------------------------

5. Developing MapReduce Applications
------------------------------------

The MapReduce API is documented in the files include/mapReduce.h for PPE code
and include/mapReduce_spu.h for SPE code.  The user PPE code is responsible for
building the required MapReduce specification, the input and output data
structures, and calling the MapReduce runtime.  The user SPE code defines the 
remaining functions used by the MapReduce runtime.  The components of each of
these tasks are documented in the include files, and sample applications are
provided under the benchmarks/ directory.  The test/base application also
provides an example of how to extract and manipulate output data.  These things
combined should provide ample information to build complete and fully-functional
MapReduce applications.

It is recommended to use the existing makefile building infrastructure, copying
and making modifications to the makefiles of the sample applications.  The only
action that should be necessary is to change occurences of the old application
string inside the PPE and SPE makefiles to the new application string.  

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.