MapReduce-MPI WWW Site - MapReduce-MPI Documentation

Settings and defaults

These are internal library variables that can be set by your program:

All the settings except fpath are set in the following manner from C++:

MapReduce *mr = new MapReduce(MPI_COMM_WORLD);
mr->verbosity = 1; 

Because fpath takes a string argument, it is set with the following function:

mr->set_fpath(char *string); 

See the C interface and Python interface doc pages for how to set the various settings from C and Python.

As documented below, some of these settings can be changed at any time. Others only have effect if they are changed before the MapReduce object begins to operate on KeyValue and KeyMultiValue objects.


The mapstyle setting determines how the N map tasks are assigned to the P processors by the map() method.

A value of 0 means split the tasks into "chunks" so that processor 0 is given tasks from 0 to N/P, proc 1 is given tasks from N/P to 2N/P, etc. Proc P-1 is given tasks from N - N/P to N.

A value of 1 means "strided" assignment, so proc 0 is given tasks 0,P,2P,etc and proc 1 is given tasks 1,P+1,2P+1,etc and so forth.

A value of 2 uses a "master/slave" paradigm for assigning tasks. Proc 0 becomes the "master"; the remaining processors are "slaves". Each is given an initial task by the master and reports back when it is finished. It is then assigned the next available task which continues until all tasks are completed. This is a good choice if the CPU time required by various mapping tasks varies greatly, since it will tend to load-balance the work across processors. Note however that proc 0 performs no mapping tasks.

This setting can be changed at any time.

The default value for mapstyle is 0.


The all2all setting determines how point-to-point communication is done when the aggregate() method is invoked, either by itself or as part of a collate().

A value of 0 means custom routines for irregular communication are used. A value of 1 means the MPI_Alltoallv() function from the MPI library is used. The results should be identical. Which is faster depends on the MPI library implementation of the MPI standard on a particular machine.

This setting can be changed at any time.

The default value for all2all is 1.


The verbosity setting determines how much diagnostic output each library call prints to the screen. A value of 0 means "none". A value of 1 means a "summary" of the results across all processors is printed, typically a count of total key/value pairs and the memory required to store them. A value of 2 prints the summary results and also a "histogram" of these quantities by processor, so that you can detect memory usage imbalance.

This setting can be changed at any time.

The default value for verbosity is 0.


The timer setting prints out timing information for each call to the library. A value of 0 means "none". A value of 1 invokes an MPI_Barrier() at the beginning and end of the operation and prints the elapsed time, which will be the same on all processors. A value of 2 invokes no MPI_Barrier() calls and prints a one-line summary of timing results across all processors and also a "histogram" of the time on each processor, so that you can detect computational imbalance.

This setting can be changed at any time.

The default value for timer is 0.


The memsize setting determines the page size (in Mbytes) of each page of memory allocated by the MapReduce object to perform its operations. Once allocated, pages are never deallocated until the MapReduce object is deleted, but pages are reused by successive operations performed by the library. The number of pages required by different methods varies; 1 to 7 is typical. See this section for a summary of memory page requirements.

The minimum allowed value for the memsize setting is 1, meaning 1 Mb pages.

IMPORTANT NOTE: The maximum value is unlimited, but you should insure the total memory consumed by all pages allocated by all the MapReduce objects you create, does not exceed the physical memory available (which may be shared by several processors if running on a multi-core node). If you do this, then many systems will allocate virtual memory, which will typically cause MR-MPI library operations to run very slowly and thrash the disk.

If the data owned by a processor in its collection of KeyValue or KeyMultiValue pairs fits within one page, then no disk I/O is performed; the MR-MPI library runs in-core. If data exceeds the page size, then it is written to temporary disk files and read back in for subsequent operations; the MR-MPI library runs out-of-core. See this section for more discussion of out-of-core operations. These files are created on a per-processor basis and are deleted when no longer needed. Thus if you delete all MapReduce objects that you have instantiated, no such files should exist at the end of the user program. If you should need to clean them up yourselves (e.g. your program crashes), see the discussion of the fpath setting which describes how they are named and where they reside.

If you set memsize small, then processing a large data set will induce many reads and writes to disk. If you make it large, then the reads and writes will happen in large chunks, which generally yields better I/O performance. However, past a few MBytes in size, there may be little gain in I/O performance.

This setting can only be changed before the first KeyValue or KeyMultiValue object is created by the MapReduce object. If changed after that, it will have no effect.

The default value for memsize is 64, meaning 64 Mbyte pages.


The minpage setting determines how many memory pages each processor pre-allocates when the MapReduce object performs its first operation. Minpage can be set to a number >= 0.

This setting can only be changed before the first KeyValue or KeyMultiValue object is created by the MapReduce object. If changed after that, it will have no effect.

The default value for minpage is 0.


The maxpage setting determines the maximum number of pages a processor can ever allocate when performing MapReduce operations. Normally this will be no more than 7; see the discussion in this section for more details. Maxpage can be set to a number >= 0. A value of 0 means there is no limit; new pages are allocated whenever they are needed.

This setting can be changed at any time, though previously-allocated pages are not deleted if maxpage is set to a smaller number.

The default value for maxpage is 0.


The keyalign and valuealign settings determine the byte alignment of keys and values generated by the user program when they are stored inside the library and passed back to the user program. A setting of N means N-byte alignment. N must always be a power of two.

As explained in this section, keys and values are variable-length strings of bytes. The MR-MPI library knows nothing of their contents and simply treats them as contiguous chunks of bytes. This section explains why it may be important to insure proper alignment of numeric data such as integers and floating point values.

Because keys are stored following integer lengths, keys are always at least 4-byte aligned. A larger alignment value can be specified if desired.

Because they follow keys, which may be of arbitrary length (e.g. a string), values can be 1-byte aligned. Note that if all keys are integers, then values will also be 4-byte aligned. A larger alignment value can be specified if desired.

When a multi-value is returned to the user program, e.g. by the callback of a reduce() method, only the first value in the multi-value is aligned to the valuealign setting. Subsequent values are packed one after the other. If all values are the same data-type, e.g. integers, then they will all have the same alignment. However, if the values are mixed data types (e.g. strings and integers), then you may need to insure each value is aligned properly before using it in your myreduce() function. See the Technical Details for more discussion of data alignment.

These settings can only be changed before the first KeyValue or KeyMultiValue object is created by the MapReduce object. If changed after that, they will have no effect.

The default value for keyalign and valuealign is 4, meaning 4-byte alignment of keys and values.


The fpath setting determines the pathname for all disk files created by the MR-MPI library when it runs in out-of-core mode. Note that it is not a pathname for user data files read by the map() method. Those should be specified directly as part of the filename.

Out-of-core disk files are created with names like "fpath/mrmpi.kv,N,M,P" where "kv" is an file-type string ("kv", or "kmv" or "sort" or "part" or "set"), N is a number unique to each MapReduce object, M is a file counter, and P is the processor ID. fpath/mrmpi.kmv.N.P. Sort files are created by the sorting methods. Part and set files are created by collate() or convert() methods.

Setting fpath may be useful for specifying a disk local to each processor, or for a parallel file system that each processor can access.

This setting can only be changed before the first KeyValue or KeyMultiValue object is created by the MapReduce object. If changed after that, it will have no effect.

The default value for fpath is ".", which means the current working directory.