"MapReduce-MPI WWW Site"_mws - "MapReduce-MPI Documentation"_md :c

:link(mws,http://mapreduce.sandia.gov)
:link(md,Manual.html)

:line

Examples :h3

This section describes the MapReduce programs provided in the examples
directory of the distribution:

"wordfreq"_#word
"rmat"_#rmat :ul

Each are provided in 3 formats: as a C++ program, C program, and
Python script.  Note that the Python scripts use the PyPar package
which provides a Python/MPI interface, as discussed above in the
"Python Interface"_Interface_python.html section, so you must have
"PyPar"_pypar installed in your Python to run them.

:link(pypar,http://datamining.anu.edu.au/~ole/pypar)

The C++ and C programs can be built (assuming you have already built
the MR-MPI library) by typing

make -f Makefile.foo :pre

from within the examples directory, using one of the provided
Makefiles.  As with the library itself, you may need to edit one of
the Makefiles to create a new version appropriate to your machine.

The examples directory also includes input scripts for the scripting
interface to MR-MPI called OINK.  There are scripts for word frequency
(in.wordfreq), R-MAT generation (in.rmat) and various graph algorithms
(in.cc, in.tri, in.luby, in.sssp, in.pagerank), described in the paper
by "Plimpton and Devine"_#Plimpton.

OINK has its own "manual and doc pages"_../oinkdoc/Manual.html.
To run these scripts you will need to build OINK, and then
run one of the scripts as follows:

oink_machine < in.rmat :pre

:line

Word frequency example :link(word),h4

The wordfreq programs implement the word frequency counting algorithm
described above in "this section"_Whatis.html.  The wordfreq programs
are run by giving a list of text files as arguments, e.g.

wordfreq ~/mydir/*.cpp
mpirun -np 8 wordfreq ~/mydir/*.cpp
cwordfreq ~/mydir/*.cpp
mpirun -np 8 cwordfreq ~/mydir/*.cpp
python wordfreq.py ~/mydir/*.cpp
mpirun -np 8 python wordfreq.py ~/mydir/*.cpp :pre

Total word counts and a list of the top 10 words should be printed to
the screen, along with the time to perform the operation.

The 3 different versions of the wordfreq program should give the same
answers, although if non-text files are used, the parsing of the
contents into words can be done differently by the C library strtok()
function and the Python string "split" method.

:line

R-MAT matrices example :link(rmat),h4

The rmat programs generate a particular form of randomized sparse
matrix known as an "R-MAT matrix"_#RMAT.  Depending on the parameters
chosen, the sparsity pattern in the resulting matrix can be highly
non-uniform, and a good model for irregular graphs, such as ones
representing a network of computers or WWW page links.

The rmat programs are run by specifying a few parameters, e.g.

rmat N Nz a b c d frac outfile
mpirun -np 8 rmat N Nz a b c d frac outfile
crmat N Nz a b c d frac outfile
mpirun -np 8 crmat N Nz a b c d frac outfile
python rmat.py N Nz a b c d frac outfile
mpirun -np 8 python rmat.py N Nz a b c d frac outfile :pre

The meaning of the parameters is as follows.  Note that only matrices
with a power-of-2 number of rows can be generated, so specifying N=20
creates a matrix with over a million rows.

2^N = # of rows in matrix
Nz = average # of non-zeroes per row
a,b,c,d = generation params for matrix entries, must sum to 1
frac = randomization parameter between 0 and 1
seed = random # seed, positive integer
outfile = optional output file :ul

A full description of the R-MAT generation algorithm is beyond the
scope of this doc page, but here's the brief version.  The {a,b,c,d}
parameters are effectively weights on the 4 quadrants of the matrix.
To generate a single new matrix element, one quadrant is chosen, with
a probability proportional to its weight.  This operation is repeated
recursively within the chosen quadrant, applying the {frac} parameter
to randomize the weights a bit.  After N iterations, a single I,J
matrix location has been identified and its value is set (to 1 in this
case).

The total number of matrix entries generated is Nx * 2^N.  This
procedure can generate duplicates, so those are removed, and new
elements generated until the desired number is reached.

When completed, the matrix statistics are printed to the screen, along
with the time to generate the matrix.  If the optional {outfile}
parameter is specified, then the matrix entries are written to files
(one per processor).  Each line of any file has the form

I J value :pre

where I,J are the matrix row,column and value is the matrix entry (all
are 1 in this case).  If the files are concatenated together,
the full set of matrix entries should result.

The 3 different versions of the rmat programs should give the same
answers in a statistical sense.  The answers will not be identical
because the same random number generation scheme is not used in all 3
programs.

:line

:link(RMAT)
[(RMAT)] D. Chakrabarti, Y. Zhan, C. Faloutsos, R-MAT: A Recursive
Model for Graph Mining", if Proceedings of the SIAM Conference on Data
Mining (2004), available at
http://www.cs.cmu.edu/~deepay/mywww/papers/siam04.pdf.

:link(Plimpton) 
[(Plimpton)] Plimpton and Devine, "MapReduce in MPI for Large-Scale
Graph Algorithms", to appear in Parallel Computing (2011).
