MapReduce-MPI WWW Site

MapReduce-MPI Library

This document describes the 15 March 2010 version of the open-source MapReduce-MPI (MR-MPI) library that implements the MapReduce operation popularized by Google on top of standard MPI message passing. The library is designed for parallel execution on distributed-memory platforms, but will also operate on a single processor. The library is written in C++, is callable from hi-level langauges (C++, C, Fortran, Python, or other scripting languages), and requires no additional software except linking with an MPI library if you wish to perform MapReduces in parallel.

Similar to the original Google design, a user performs a MapReduce by writing a small program that invokes the library. The user typically provides two application-specific functions, a "map" and a "reduce", that are called by the library when a MapReduce operation is executed. "Map" and "reduce" are serial functions, meaning they are invoked independently on individual processors on portions of your data when performing a MapReduce operation in parallel.

The goal of this library is to provide a simple and portable interface for users to create their own MapReduce programs, which can then be run on any desktop or large parallel machine using MPI. See the Background section for features and limitations of this implementation.

Source codes for the library is freely available for download from the MR-MPI web site and is licensed under the modified Berkeley Software Distribution (BSD) License. This basically means it can be used by anyone for any purpose. See the LICENSE file provided with the distribution for more details.

The distrubution includes a few examples of simple programs that illustrate the use of MapReduce.

The authors of this library are Steve Plimpton at Sandia National Laboratories and Karen Devine who can be contacted via email: sjplimp at sandia.gov, kddevin at sandia.gov.