mapreduce_cell Code
Brought to you by:
apapag
File | Date | Author | Commit |
---|---|---|---|
benchmarks | 2009-10-11 | apapag | [r2] |
data | 2009-10-10 | apapag | [r1] |
doc | 2009-10-10 | apapag | [r1] |
include | 2009-10-10 | apapag | [r1] |
lib | 2009-10-10 | apapag | [r1] |
src | 2009-10-10 | apapag | [r1] |
tests | 2009-10-10 | apapag | [r1] |
Makefile | 2009-10-10 | apapag | [r1] |
README | 2009-10-10 | apapag | [r1] |
MapReduce for the Cell B.E. Architecture by Marc de Kruijf dekruijf@cs.wisc.edu ------------------------------------------------------------------------------- Contents: --------- 0. Documentation 1. Building MapReduce 2. MapReduce Build Configurations 3. MapReduce Testing 4. MapReduce Performance Data 5. Developing MapReduce Applications ------------------------------------------------------------------------------- 0. Documentation ---------------------- The MapReduce API is documented in the files include/mapReduce.h for PPE code and include/mapReduce_spu.h for SPE code. A rigorous design discussion and evaluation of the runtime can be found in a published technical report included in this distribution under the doc/ directory. f you use this work in your own work, we would appreciate letting us know. If you want to cite MapReduce for Cell in your research writings, please refer to the paper “MapReduce for the Cell B.E. Architecture” by M. de Kruijf and K. Sankaralingam, University of Wisconsin Computer Sciences Technical Report CS-TR-2007-1625, October 2007. ------------------------------------------------------------------------------- 1. Building MapReduce: ---------------------- MapReduce on Cell is built by typing 'make' at the command prompt. The following two environment variables must be set for the build to complete successfully: CELL_TOP - The SDK root directory -- typically /opt/ibm/cell-sdk/prototype MAP_REDUCE_TOP - The MapReduce root directory. It is the directory in which this README file is located. Also make sure that the Cell SDK binaries are in your path. MapReduce builds SPE code using the SPU XLC compiler by default. To use the GCC compiler, modify the makefile in the root directory accordingly. MapReduce on Cell has been successfully built using Cell SDK version 2.1 running on a Pentium 4 with Fedora Core 5. Although the runtime has been verified to run successfully on the Cell Simulator, virtually all of the runtime testing and development was performed on a 3.2 Ghz Cell Blade scheduled via IBM PartnerWorld's Virtual Loaner Program. ------------------------------------------------------------------------------- 2. MapReduce Build Configurations: ---------------------------------- There are four additional, optional environment variables that can be set to influence the type of build that will result: ASSERT - If this variable is set, mapReduce will build with asserts enabled. If an assert fails during execution of the runtime, a message will be displayed on stderr. LOG - If this variable is set, mapReduce will build with event logging enabled. During execution the runtime will output an trace of the execution flow to stdout. DEBUG - If this variable is set, mapReduce will build with no optimizations and debugging information. This configuration is for use in combination with gdb. CLOCK - If this variable is set, mapReduce will gather performance statistics during execution and output verbose timing information upon completion. The output will appear on stdout. Note that it is not strictly necessary to set these environment variables in advance. For example, the parameters can be set when invoking make as follows: '$ make DEBUG=1 LOG=1 ASSERT=1' IMPORTANT: Please note that building a special configuration using these variables will only work if make is invoked in the MapReduce root directory; the makefiles in subdirectories do not look for these environment variables. Also note that, to build mapReduce using a different configuration after having already built mapReduce, type 'make clean' to first clean the previous build. If this is not done, there is a strong possibility that the build will not complete as desired. ------------------------------------------------------------------------------- 3. MapReduce Testing: --------------------- Only one test is included in this mapReduce package. It is located under test/base. In the MapReduce root directory there is a file called runTests.sh. This file is a Bash script that runs the test for increasing input sizes. The test will simply continue running larger and larger dataset sizes until it fails. The test verifies that the runtime completes without error for a particular application where the expected output is known in advance. ------------------------------------------------------------------------------- 4. MapReduce Performance Data: ------------------------------ Several benchmarks are provided in the mapReduce package. For the details of each, please see the accompanying technical report. Two benchmarks not included in the tech report are: benchmarks/gridInterpolate - In this application, a file containing voxel data for a 3D space is interpolated based on a given isovalue. This is the first step in the Marching Cubes visualization algorithm. The runtime takes as input a 3D grid of points and an associated value for each point. The map phase checks the value at each point against the isovalue and, for each edge to which the point belongs, outputs the point and whether it is above or below the isovalue. The reduce function takes as input an edge key along with two points for that edge and interpolates the edge if the two points have values on opposite sides of the isovalue. (This application is not known to perform well using MapReduce versus its single-threaded counterpart.) benchmarks/sqrtCount - In this application, a randomly generated set of values is passed into the MapReduce runtime. The map function takes the square root of each value and emits the integer truncation of the square root along with the value "1". The reduce function sums up the number of values associated with a given key, in effect counting the number of values that have the same number as their truncated square root value. This application, while having no real-world usefulness of its own, is representative of counting applications such as counting the frequency of occurences of a word in a text file, or counting the frequency of a given pixel color in a bitmap file. Each benchmark takes the number of SPEs and the number of partitions (hash buckets) as an execution parameter. gridInterpolate also takes as input a data file and an isovalue. Single-threaded versions of each benchmark are also available under the serial/ directory of each benchmark. The single-threaded applications were run and tested on an Intel Core 2 Duo processor. ------------------------------------------------------------------------------- 5. Developing MapReduce Applications ------------------------------------ The MapReduce API is documented in the files include/mapReduce.h for PPE code and include/mapReduce_spu.h for SPE code. The user PPE code is responsible for building the required MapReduce specification, the input and output data structures, and calling the MapReduce runtime. The user SPE code defines the remaining functions used by the MapReduce runtime. The components of each of these tasks are documented in the include files, and sample applications are provided under the benchmarks/ directory. The test/base application also provides an example of how to extract and manipulate output data. These things combined should provide ample information to build complete and fully-functional MapReduce applications. It is recommended to use the existing makefile building infrastructure, copying and making modifications to the makefiles of the sample applications. The only action that should be necessary is to change occurences of the old application string inside the PPE and SPE makefiles to the new application string.