Using an Intel® Cilk™ Plus Reducer in C

The method for using Intel® Cilk™ Plus reducers in C++ code is well documented, however an interface for using reducers in C code was just added as part of the Intel® Parallel Composer 2011 release.  Here is what you must do in order to use them.

For provided reducer types, like reducer_opadd:

1. #include the appropriate header for the reducer (for example, #include <cilk\reducer_opadd.h>)
2. Declare the reducer object using CILK_C_REDUCER_<type>(<variable name>, <variable type>, initial value).  So for example, to declare a reducer_opadd<int> sum with initialization 0, you would use:

CILK_C_REDUCER_OPADD(sum, int, 0);

3. To access the value in a serial region, use the value member variable of the reducer.  Continuing the above example, this would be sum.value
4. Inside a Cilk Plus parallel region, use REDUCER_VIEW(reducer name) to access the value.  Again, in our current example, this would be:

REDUCER_VIEW(sum)

If the reducer is not a global:
5. You must register and unregister the reducer variable before and after use to avoid initialization and memory leak issues. 
6. After the declaration of the reducer but before the first use, insert CILK_C_REGISTER_REDUCER(<reducer name>); to register.
7. Then when the reducer is no longer needed, insert CILK_C_UNREGISTER_REDUCER(<reducer name>); so using the above examples:

CILK_C_REGISTER_REDUCER(sum);
// sum is used
CILK_C_UNREGISTER_REDUCER(sum);

For custom reducers:

1. #include <cilk\reducer.h>
2. You'll need to create two or three functions to define the behavior of the reducer:

void x_identity(void* key, void* v);
void x_reduce(void* key, void* l, void* r);
void x_destroy(void* key, void* p); // Only necessary if dynamic destruction needed

3. You'll then need to instantiate the reducer type like so:

typedef CILK_C_DECLARE_REDUCER(<data type>) x_reducer;

4. Then to declare and initialize your reducer for use, use the following syntax:

x_reducer <reducer name> = CILK_C_INIT_REDUCER(<data type>, x_reducer, x_identity, x_destroy, <initial value>);

5. Then just follow the same rules for accessing the reducer as defined in the section about using provided reducer types in steps 3-7.

Example code will be posted when ready.

For more complete information about compiler optimizations, see our Optimization Notice.

5 comments

Top
pitsianis's picture

Hello James and Intel Cilk community.

Can you please post an example of working Cilk reduction in C that is supported by
> icc --version
icc (ICC) 13.0.0 20120731

Thanks,
N

anonymous's picture

Hi James.
You wrote
void c_parallel3() {
int i;
CILK_C_REDUCER_OPADD(global_result, int, 47);
CILK_C_REGISTER_REDUCER(global_result); // not needed, but allowed

printf("C cilk_for, reduction global_result registeredn");

for (i = 0; i < ARRAY_SIZE; ++i) {
global_result.value += data[i];
}
printf("The result is: %dn", REDUCER_VIEW(global_result));
CILK_C_UNREGISTER_REDUCER(global_result); // not needed, but allowed
}

First, I suppose you meant cilk_for instead of for. Assuming that, should
global_result.value += data[i]; be changed to REDUCER_VIEW(global_result) += data[i]?

James R.'s picture

Oops - I pasted in an old results file - here are actual results from the program I pasted in:
00 Hello, World 011
01 Hello, World 012
02 Hello, World 014
03 Hello, World 017
04 Hello, World 021
05 Hello, World 026
06 Hello, World 032
07 Hello, World 039
08 Hello, World 047
09 Hello, World 056
10 Hello, World 066
Final 066

00 Hello, World 011 00100570
01 Hello, World 012 00100570
02 Hello, World 014 00100570
03 Hello, World 017 00100570
04 Hello, World 021 00100570
05 Hello, World 026 00100570
06 Hello, World 032 00100570
07 Hello, World 039 00100570
08 Hello, World 047 00100570
09 Hello, World 056 00100570
10 Hello, World 066 00100570
Final 066

In this case, the work was so little that Cilk on ran on one thread. Your results may vary. If you increase to 100 additions - it spread across cores.

James R.'s picture

This short program adds the numbers 0 to 11 together using for, and cilk_for, with reducers in otherwise identical code to illustrate equivalence.

#include <stdio.h>
#include <cilk/cilk.h>
#include <cilk/reducer_opadd.h>

int serial()
{
int sum = 11;
for (int i=0;i<11;i++) {
sum += i;
printf("%02d Hello, World %03dn",i,sum);
}
printf("Final %03dnn",sum);
return 0;
}

int parallel()
{
//int sum = 11;
CILK_C_REDUCER_OPADD(sum, int, 11);
CILK_C_REGISTER_REDUCER(sum);
cilk_for (int i=0;i<11;i++) {
REDUCER_VIEW(sum) += i;
printf("%02d Hello, World %03d %08xn",i,REDUCER_VIEW(sum),&REDUCER_VIEW(sum));
}
printf("Final %03dn",sum.value);
CILK_C_UNREGISTER_REDUCER(sum);
return 0;
}

int main()
{
serial();
parallel();
return 0;
}

Sample results that I got running the serial, and then the parallel version (on a dual core machine), are:
00 Hello, World 000
01 Hello, World 001
02 Hello, World 003
03 Hello, World 006
04 Hello, World 010
05 Hello, World 015
06 Hello, World 021
07 Hello, World 028
08 Hello, World 036
09 Hello, World 045
Final 045

00 Hello, World 000 0040fc40
05 Hello, World 005 01cf1fd0
01 Hello, World 001 0040fc40
06 Hello, World 011 01cf1fd0
02 Hello, World 003 0040fc40
07 Hello, World 018 01cf1fd0
03 Hello, World 006 0040fc40
08 Hello, World 026 01cf1fd0
04 Hello, World 010 0040fc40
09 Hello, World 035 01cf1fd0
Final 045

The results are not consistent from run to run, and the addresses of the private variable is only shown to make it obvious that two sums are going on simultaneously to create the output shown. It is somewhat interesting to run this trial over and over and watch the variations in the order of the summing, as load balancing kicks in a little differently between different runs.

James R.'s picture

Here is sample code showing the serial and parallel versions of C++ and C code doing reductions using Cilk Plus:

#include <iostream>
#include <cilk/cilk.h>
#include <cilk/reducer_opadd.h>

#define ARRAY_SIZE 101

int global_result;

int data[ARRAY_SIZE];

void cpp_serial() {
int result = 47;

std::cout << "C++ for, reduction result" << std::endl;

for (std::size_t i = 0; i < ARRAY_SIZE; ++i) {
result += data[i];
}
std::cout << "The result is: " << result << std::endl;
}

void cpp_parallel() {
int my_result = 47;
cilk::reducer_opadd<int> result;

std::cout << "C++ cilk_for, reduction result" << std::endl;

cilk_for (std::size_t i = 0; i < ARRAY_SIZE; ++i) {
result += data[i];
}
my_result += result.get_value();
std::cout << "The result is: " << my_result << std::endl;
}

void c_serial() {
int i;
int result = 47;

printf("C for, reduction resultn");

for (i = 0; i < ARRAY_SIZE; ++i) {
result += data[i];
}
printf("The result is: %dn", result);
}

void c_parallel() {
int i;
CILK_C_REDUCER_OPADD(result, int, 47);
CILK_C_REGISTER_REDUCER(result);

printf("C cilk_for, reduction resultn");

for (i = 0; i < ARRAY_SIZE; ++i) {
result.value += data[i];
}
printf("The result is: %dn", REDUCER_VIEW(result));
CILK_C_UNREGISTER_REDUCER(result);
}

void c_parallel2() {
int i;
CILK_C_REDUCER_OPADD(global_result, int, 47);

printf("C cilk_for, reduction global_resultn");

for (i = 0; i < ARRAY_SIZE; ++i) {
global_result.value += data[i];
}
printf("The result is: %dn", REDUCER_VIEW(global_result));
}

void c_parallel3() {
int i;
CILK_C_REDUCER_OPADD(global_result, int, 47);
CILK_C_REGISTER_REDUCER(global_result); // not needed, but allowed

printf("C cilk_for, reduction global_result registeredn");

for (i = 0; i < ARRAY_SIZE; ++i) {
global_result.value += data[i];
}
printf("The result is: %dn", REDUCER_VIEW(global_result));
CILK_C_UNREGISTER_REDUCER(global_result); // not needed, but allowed
}

int main (int argc, char * const argv[]) {
int i;
cilk_for( int i = 0; i < ARRAY_SIZE; ++i ) {
data[i] = i;
}
cpp_serial();
cpp_parallel();
c_serial();
c_parallel();
c_parallel2();
c_parallel3();
return 0;
}

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.