We have a code that is exhibiting greatly different runtimes between a Fortran and C version. The problem has been isolated to one simple loop:

#pragma omp parallel for reduction(max:dt)

for(i = 1; i <= NR; i++){

for(j = 1; j <= NC; j++){

dt = fmax( fabs(t[i][j]-t_old[i][j]), dt);

t_old[i][j] = t[i][j];

}

}

Which runs about 12 times slower than the equivalent Fortran loop:

!$omp parallel do reduction(max:dt)

Do j=1,NC

Do i=1,NR

dt = max( abs(t(i,j) - told(i,j)), dt )

Told(i,j) = T(i,j)

Enddo

Enddo

!$omp end parallel do

Removing the dt assignment eliminates the disparity. Also, running these as serial codes shows no disparity, do the problem is not that the actual C implementation is just so bad. Also, eliminating just the reduction does not close the gap, so it is not the reduction operation itself.

All of those tests lead us to the conclusion that there is some terrible interaction between OpenMP and fmax/abs. Any help appreciated.

Thanks in advance,

Jon