Weird OpenMP Reduction

Published: 05/02/2013, Last Updated: 05/02/2013

Typical reductions in OpenMP* involve using a associative operator op to do local reductions, and then using a reduction clause to collect those local reductions.  For example, the following code computes a dot product by computing local sums on each thread and then summing them.  

// Returns dot product of two vectors int dot( int x[], int y[], int n ) {
    int sum = 0;
#pragma omp parallel for reduction(+:sum)
    for( int i=0; i<n; ++i ) {
        sum += x[i]*y[i];
    return sum;

The reduction clause specifies two things:
  1. When control enters the parallel region, each thread in the region gets a thread-private copy of sum, initialized to the identity element for +.
  2. When control leaves the parallel region, the original sum is updated by combining its value with the final values of thethread-private copies, using +.  Since + is associative (or nearly so for floating-point), the final sum has the same value (or nearly so) as it would for serial execution of the code.
The reduction clause does not say specify anything more -- you are allowed to do anything with the thread-local copies of a reduction variable.  The code below demonstrates this point.  It computes the dot product of two integer vectors without using any addition operations inside the loop.  It assumes two's complement arithmetic.

// Returns dot product of two vectors
int dot( int x[], int y[], int n ) {
    int c = 0;
    int s = 0;
#pragma omp parallel for reduction(+:c,s)
    for( int i=0; i<n; ++i ) {
        int p = x[i]*y[i];
        int q = c^p;
        c &= p|s;
        c |= p&s;
        c <<= 1;
        s ^= q;
    return c+s;

Consider it a puzzle, not recommended practice.  If you need a hint for why it works correctly, look up the datasheet for the classic TTL 74183 chip.  [Last line edited 2015-4-1 because old link to 74183 datasheet broke.]

*Other names and brands may be claimed as the property of others

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804