Weird OpenMP Reduction

Typical reductions in OpenMP* involve using a associative operator op to do local reductions, and then using a reduction clause to collect those local reductions.  For example, the following code computes a dot product by computing local sums on each thread and then summing them.  

// Returns dot product of two vectors int dot( int x[], int y[], int n ) {
    int sum = 0;
#pragma omp parallel for reduction(+:sum)
    for( int i=0; i<n; ++i ) {
        sum += x[i]*y[i];
    return sum;

The reduction clause specifies two things:
  1. When control enters the parallel region, each thread in the region gets a thread-private copy of sum, initialized to the identity element for +.
  2. When control leaves the parallel region, the original sum is updated by combining its value with the final values of thethread-private copies, using +.  Since + is associative (or nearly so for floating-point), the final sum has the same value (or nearly so) as it would for serial execution of the code.
The reduction clause does not say specify anything more -- you are allowed to do anything with the thread-local copies of a reduction variable.  The code below demonstrates this point.  It computes the dot product of two integer vectors without using any addition operations inside the loop.  It assumes two's complement arithmetic.

// Returns dot product of two vectors
int dot( int x[], int y[], int n ) {
    int c = 0;
    int s = 0;
#pragma omp parallel for reduction(+:c,s)
    for( int i=0; i<n; ++i ) {
        int p = x[i]*y[i];
        int q = c^p;
        c &= p|s;
        c |= p&s;
        c <<= 1;
        s ^= q;
    return c+s;

Consider it a puzzle, not recommended practice.  If you need a hint for why it works correctly, look up the datasheet for the classic TTL 74183 chip.  [Last line edited 2015-4-1 because old link to 74183 datasheet broke.]

*Other names and brands may be claimed as the property of others