Advanced Topic: How to Write a New Reducer

You can write a custom reducer if none of the supplied reducers satisfy your requirements.

Any of the supplied reducers can be used as models for developing new reducers, although some of these examples are relatively complex. The implementations are found in the reducer_*.h header files in the include/cilk directory of the installation.

Additional examples are provided in the following sections.

Components of a Reducer

A reducer can be broken into four logical parts:

The reducers in the reducer library use struct instead of class for the View and Monoid classes. Recall that the only difference between struct and class in C++ is that the default access for a class is private, while the default access for a struct is public.

The Identity Value

The identity value is that value, which, when combined with another value in either order (that is, a "two-sided identity"), produces that second value. For example:

The Monoid

In mathematics, a monoid comprises a set of values (type), an associative operation on that set, and an identity value for that set and that operation. So for example (integer, +, 0) is a monoid, as is (real, *, 1).

In the reducer library, a Monoid is defined by a type T along with following five functions:

reduce(T *left, T *right)

evaluates *left = *left OP *right

identity(T *p)

constructs IDENTITY value into the uninitialized *p

destroy(T *p)

calls the destructor on the object pointed to by p


returns a pointer to size bytes of raw memory


deallocates the raw memory at p

These five functions must be either static or const. A class that meets the requirements of Monoid is usually stateless, but will sometimes contain state used to initialize the identity object.

The monoid_base class template is a useful base class for a large set of Monoid classes for which the identity value is a default-constructed value of type T, allocated using operator new. A class derived from monoid_base need only declare and implement the reduce function.

The reduce function merges the data from the "right" instance into the "left" instance. After the runtime system calls the reduce function, it will destroy the right instance.

For deterministic results, the reduce function must implement an associative operation, although it does not need to be commutative. If implemented correctly, the reduce function will retain serial semantics. This means that the result of running the application serially, or using a single worker, is the same as running the application with multiple workers. The runtime system, together with the associativity of the reduce function ensures that the results will be the same, regardless of the number of workers or processors, or how the strands are scheduled.

Writing Reducers - A "Holder" Example

This example shows how to write a simple reducer; it has no update methods, and the reduce method does nothing. A "Holder" is analogous to the "Thread Local Storage," but without the pitfalls described in General Interaction with OS Threads.

The rule that you should not call get_value() except when fully synched is intentionally violated here.

Such a reducer has a practical use. Suppose there is a global temporary buffer used by each iteration of a for loop. This is safe in a serial program, but unsafe if the for loop is converted to a parallel cilk_for loop.

Consider the following program that reverses an array of point elements, using a global temporary variable temp while swapping values.

#include <cilk/cilk.h>
#include <cstdio>

class point
   point() : x_(0), y_(0), valid_(false) {};
   void set (int x, int y) {
   x_ = x;
   y_ = y;
   valid_ = true;

   void reset() { valid_ = false; }

   bool is_valid() { return valid_; }
   int x() { if (valid_) return x_; else return -1; }
   int y() { if (valid_) return y_; else return -1; }

   int x_;
   int y_;
   bool valid_;
point temp; // temp is used when swapping two elements

int main(int argc, char **argv)
   int i;
   point ary[100];

   for (i = 0; i < 100; ++i)
     ary[i].set(i, i);

   cilk_for (int j = 0; j < 100 / 2; ++j)
   // reverse the array by swapping 0 and 99, 1 and 98, etc.
   temp.set(ary[j].x(), ary[j].y());
   ary[j].set (ary[100-j-1].x(), ary[100-j-1].y());
   ary[100-j-1].set (temp.x(), temp.y());
   // print the results
   for (i = 0; i < 100; ++i)
     std::printf ("%d: (%d, %d)\n", i, ary[i].x(), ary[i].y());

return 0;

There is a race on the global variable temp, but the serial program will work properly. In this example, it would be simple and safe to declare temp inside the cilk_for(). However, the "holder" pattern described below can be used to provide a form of storage that is local to strands.

The solution is to implement and use a "holder" reducer.

The point_holder class is the reducer. It uses the point class as the view. The Monoid class contains a reduce function that does nothing, since it doesn't matter which version is retained. The rest of the methods of point_holder allow you to access and update the reducer data.

#include <cilk/reducer.h>
class point
   // Define the point class here, exactly as above

// Define the point_holder reducer
class point_holder
   struct Monoid: cilk::monoid_base<point>
     // reduce function does nothing
     static void reduce (point *left, point *right) {}

   cilk::reducer<Monoid> imp_;

   point_holder() : imp_() {}

   void set(int x, int y) {
     point &p = imp_.view();
     p.set(x, y);

   bool is_valid() { return imp_.view().is_valid(); }

   int x() { return imp_.view().x(); }

   int y() { return imp_.view().y(); }
}; // class point_holder

To use the point_holder reducer in the sample program, replace the declaration of temp with a declaration using the reducer type.


point temp; // temp is used when swapping two elements


point_holder temp; // temp is used when swapping two elements

The rest of the original program remains unchanged.

To summarize how the point_holder reducer works:

  1. The default constructor creates a new instance whenever a new point_holder is created; that is, whenever temp is referenced after a steal.

  2. The reduce method does nothing. Since temp is used for temporary storage, there is no need to combine (reduce) the left and right instances.

  3. The default destructor invokes the destructor, freeing its memory.

  4. Because the local view value produced in a loop iteration is not combined with values from other iterations, it is valid to call x() and y() to get the local values within the cilk_for.

  5. Because the reduce() function for the holder does nothing, the default constructor does not need to provide a true identity value.

Submit feedback on this help topic

Copyright © 1996-2011, Intel Corporation. All rights reserved.