Incorrect summation of values using openMP threads

Incorrect summation of values using openMP threads

Hi,

My openMP thread tries to do the below:

arr(1) = arr(1) + (a(j) + b(j))

Where j corresponds to the each openmp thread being executed and arr(1) being the shared memory location. The above expression being associative, after the completion of thread it should ideally give me arr(1) equal to serial execution. Unfortunately that does not seem to be the case and I get different results in the arr(1)  everytime I execute (and occasionally the correct value as well). I have also verified the contents of a(j) and b(j) are exactly the same. Is there anything that I am missing here?
Any insights to the above issue?

Thanks
Ajay

 

17 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Quick edit: the above expression is a multiplication and not addition arr(1) = arr(1) + (a(j) * b(j))

Try adding /assume:protect_parens and see what happens.

Steve - Intel Developer Support

If arr(1) is in a shared memory location, then the summation should be in a critical section.  Is it?  You can't have more than one thread modifying the same memory location, otherwise you can get unpredictable results.

 

Hello Steve, 

Thanks for your reply. I am not aware of this directive, do I have to do this during compilation. I am working on Visual Studio.

- Ajay

There's a data race on reading/writing arr(1) by multiple threads.  You should use an omp reduction to avoid this, although the clause doesn't accept a subobject like arr(1).  So copy arr(1) to a scalar and put that in the reduction clause.  Something like:

program foorace
use omp_lib

implicit none
integer a(100), b(100), arr(1), i, a1
arr(1) = 0

do i=1,100
 a(i) = i
 b(i) = i
end do

a1 = arr(1)

!$omp parallel do reduction(+:a1)
   do i = 1,100
      a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))
   end do
!$omp end parallel do

print *,'a1 = ',a1

end program foorace

Patrick

Thanks for the replies. I understand the part that multiple threads try to access the same data, but given the expression even if there is a race condition wont the output come same.
So say a = a+b is the expression. So does openmp execute parallel threads on partial execution of the expression.
I mean does
thread 1 compute a = a+b and then thread 2 computes a = a+b then we dont have any issues even they are executed in any order.
Unless
thread 1 loads a into memory and now thread 2 also loads a into memory, and then they execute this is a problematic scenario !! 

So if latter case is how openmp executes, then I need a way to use the reduction in my case because the expression happens in a function

 do i = 1,100
      call subroutine_math(i)
   end do

and subroutine_math does the computation on the global variables. 
Any pointers on how to approach this problem in my context ?

Ajay
 

 

Use reduction if multiple threads writing to arr(1).

arr(1) = 0.0
!$OMP PARALLEL DO REDUCTION(+:arr(1))
DO j=1,N
arr(1) = arr(1) + (a(j) * b(j))
end DO
 

Jim Dempsey

www.quickthreadprogramming.com

Hi Jim,

Do you think reduction is applicable to my case ?

Thanks
Ajay

To the extent you have described what you are doing reduction is clearly indicated. The reduction variable should be a local scalar as Patrick showed. If you are looking for performance you will use vector reduction in each thread e.g. dot_product as well as distributing work across threads.

Tim,

I could see where a reduction of arr(J) might be an issue in particular when J could be modified within the region, however, arr(1) is completely resolvable to a scalar (assuming arr is an array and not a pointer).

In any event, should the compiler complain, the programmer could use EQUIVILENCE arr(1), arrOne to get around this inconvenience.

Jim Dempsey

www.quickthreadprogramming.com

If you want to try /assume:protect_parens, add it under Fortran > Command Line > Additional Options.

Steve - Intel Developer Support

Quote:

Patrick Kennedy (Intel) wrote:

There's a data race on reading/writing arr(1) by multiple threads.  You should use an omp reduction to avoid this, although the clause doesn't accept a subobject like arr(1).  So copy arr(1) to a scalar and put that in the reduction clause.  Something like:

program foorace
use omp_lib

implicit none
integer a(100), b(100), arr(1), i, a1
arr(1) = 0

do i=1,100
 a(i) = i
 b(i) = i
end do

a1 = arr(1)

!$omp parallel do reduction(+:a1)
   do i = 1,100
      a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))
   end do
!$omp end parallel do

print *,'a1 = ',a1

end program foorace

Patrick

Patrick,

I have not before seen the coding " a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))"

Could you explain what this achieves.
Could it be similar to the following which I think achieves what was intended:

a1 = arr(1)

	!$omp parallel do reduction(+:a1)

	do i = 1,100

	a1 = a1 + ( a(i) * b(i) )

	end do

	!$omp end parallel do

	arr(1) = a1

John

The above is incorrect or may have redundant statement.

!$omp parallel do reduction(+:a1)

The above makes a private (stack copy) of a1 and initialize to 0. Note, reduction operators of + or - are initialize to 0, * initializes to 1. Then at end of parallel region, each thread's value of a1 is thread safely accumulated into the outer scoped a1.

Therefore "a1 = arr(1)" is useless since the threads copies of a1 will be zeroed.
The final "arr(1) = a1" is wrong when the original (non-zero) value of arr(1) is to be included in the sum. In this case use arr(1) = arr(1) + a1.
However, should you want zero initialization then the "a1 = arr(1)" is correct.

Jim Dempsey

www.quickthreadprogramming.com

Jim,

Then, would the following best approach what was wanted ?

    a1 = 0

	!$omp parallel do reduction(+:a1)

	    do i = 1,100

	        a1 = a1 + ( a(i) * b(i) )

	    end do

	!$omp end parallel do

	    arr(1) = arr(1) + a1

Also, is the following a misprint, or does this structure achieve something I am not aware of ?

   a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))

John

>>>a1 = a1 + (a(omp_get_thread_num()) * b(omp_get_thread_num()))

The initial description of the problem had this expression: arr(1) = arr(1) + (a(j) * b(j)), with a comment:

"Where j corresponds to the each openmp thread being executed and arr(1) being the shared memory location."

patrick

 

The initial a1=0 is unnecessary.

!$omp parallel do
do j = 1,100
c(j) = a(j) * b(j)
...

In the above loop, assuming two threads in team and static scheduling, one thread will iterate j over 1:50 and the other over 51:100

You normally would not compute the indexes using a call to omp_get_thread_num(). This said, sometimes you may want to manipulate data based on the thread number. In these situations, you often use omp_get_thread_num() once storing the result into a private variable such as iThread. Note, "thread_num" is a misnomer. The function returns a 0-based thread number of the thread team constituted for the parallel region. If nested parallel regions are in effect, each region that runs concurrently, will have different thread teams, each with 0-based thread numbering.

Jim Dempsey
 

www.quickthreadprogramming.com

Leave a Comment

Please sign in to add a comment. Not a member? Join today