OpenMP Reduction Intel Fortran v11.1

OpenMP Reduction Intel Fortran v11.1

Hi there-,

I was directed to this forum last week from the "Threading on Intel® Parallel Architectures " forum. My question is regarding the initialization of private variables used by the OpenMP REDUCTION clause. If array a is initialized to 1, I assume that private copies spawned for each thread following the REDUCTION(+:a) clause, are also initialized to 1. Consider the following trivial code

! Initialize
a(1,1:10)=1.0d0
a(2,1:5)=1.0d0
a(2,6:10)=2.0d0
!$ nthreads=1
!$ call omp_set_num_threads(nthreads)
!$ omp parallel do private(i) reduction(+:a)
do i=1, 10
a(2,i)=a(2,i)+a(1,i)
end do
!$ omp end parallel do

If I compile without the -openmp flag using Intel Fortran v11.1 on a 64-bit Linux machine, I obtain the correct or expected results, a(2,1:5)=2.0d0 and a(2,6:10)=3.0d0. However, if I compile with the -openmp flag and run the program, I obtain a(2,1:5)=1.0d0 and a(2,6:10)=2.0d0, because a(1,1:10)=0.0d0 (I checked this by simply adding a print statement). Perhaps there is something wrong with code post above?

A second question I posted in the other forum was regarding a reduction on part of the shared array a. Let's assume I take the same code as shown above and assume it is working correctly (the private copies of a are initialized to the right value). If I change the extent of the loop from do i=1, 10 to do i=1, 5, will the reduction be performed over all the elements or only the elements of the array that were changed inside the loop? That is, will the reduction yield a(2,1:10)=2.0d0 or a(2,1:5)=2.0d0 and a(2,6:10)=4.0d0.

Please feel free to comment. Any help is greatly appreciated.

2 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

Hi there,

I just wanted to follow up with the solution to this question. Hopefully I am on the right track here.

After reading the OpenMP 3.0 API specs., it turns out that invisible or private copies of the
shared array under a reduction operator are set to 0 by default when using
the addition operator. Once the parallel loop is executed, these values
are added to the shared array, however it has been initialized.

Hence in the code shown above, if OpenMP is used, the values of the
private array are a_priv=0.0d0, so the loop simply returns the original initialization.
Conversely, if OpenMP is NOT used, then we get the expected results.

The second question may be answered along similar lines. That is, since
all private or invisible copies of a are initialized to zero, then performing
a reduction on all (or part of the shared array a), yields the same result,
since all indices left unchanged have a value of zero for each private copy.

I'm pretty sure this is what's going on. Thanks!

Connectez-vous pour laisser un commentaire.