openMP bug in intel 11.1.064 ?
Hi,
I experienced a problem (bug?) with the openMP code attached below. I tested the code on different architectures: dual socket quad core Intel Xeon E5450 and dual socket quad core AMD opteron experiencing the same problem.
I tried to be as simple as possible in the test code. I would write a code which guaranties me to safely enter to the last loop (see code) without issuing omp barriers. (...waiting for thread subteams in next versions of openMP...)
I use intel compiler and OMP_NUM_THREADS=8
[zorro@pordoi212 ~]$ ifort -V
Intel Fortran Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100414 Package ID: l_cprof_p_11.1.072
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.
When compiled without any other flag than -openmp, it doesn't provide the right answer (*)
[zorro@pordoi212 ~]$ ifort -openmp testflush.F90 -o tfint
[zorro@pordoi212 ~]$ ./tfint
Test (must be 0): 1
if I add -C flag (or -g), it works fine
[zorro@pordoi212 ~]$ ifort -openmp -g testflush.F90 -o tfint
[zorro@pordoi212 ~]$ ./tfint
Test (must be 0): 0
[zorro@pordoi212 ~]$ ifort -openmp -C testflush.F90 -o tfint
[zorro@pordoi212 ~]$ ./tfint
Test (must be 0): 0
When compiling with pgi 10.3, the code is right (same with gcc)
[zorro@pordoi212 ~]$ pgf90 -mp testflush.F90 -o tfpgi
[zorro@pordoi212 ~]$ ./tfpgi
Test (must be 0): 0
If you compile on your environment, you can see that in the wrong case (*), listed times in fort.102 file reports that thread 2 (3 to 7 also in fort.103-107) doesn't wait for thread 1 completion (during checking loop)
times (write,loop) : 2.348423004150391E-004 9.536743164062500E-007
times (total) : 2.357959747314453E-004
In other cases, reported times confirm the right behaviour (threads 2 to 7 wait for thread 1 completion), an example
times (write,loop) : 2.3508071899414062E-004 1.000094890594482
times (total) : 1.000329971313477
Some comments in the code add other details to the problem.
Regards,
Stefano
----------------Here is the code------------------------------------
program main
implicit none
include 'omp_lib.h'
logical checkflag
integer i,iam,arraysize,auxsl,nth,error,chunk,k
double precision t(3)
!logical, volatile, allocatable :: checkvarv(:) !same behaviour with volatile attribute
logical, allocatable :: checkvarv(:)
integer, allocatable :: array(:)
arraysize=1000
allocate(checkvarv(arraysize))
checkvarv(:)=.false.
allocate(array(arraysize))
array=0
error=0
!$omp parallel default(none) shared(error,arraysize,nth,checkvarv,array) private(iam,checkflag,auxsl,t,i,k)
iam=OMP_GET_THREAD_NUM()
auxsl=0
checkflag=.false.
!$omp single
nth=OMP_GET_NUM_THREADS()
!$omp end single
t(1)=OMP_GET_WTIME()
! Thread 0 performs some MPI operation....
if(iam.eq.0) call sleep(2)
! Misalign thread 1 with respect to other threads when writing to array
!$omp do schedule(dynamic,1)
do i=1,arraysize
if(iam.eq.1.and.auxsl.eq.0) then
auxsl=1
call sleep(1)
endif
array(i)=iam+1
checkvarv(i)=.true.
enddo
!$omp end do nowait
t(2)=OMP_GET_WTIME()
! Loops until previous write loop is finished
do while(.not.checkflag)
!!$omp flush(checkvarv) ! if you uncomment this line, you obtain the right answer with -openmp flag only.
checkflag=.true.
do k=1,arraysize
checkflag = (checkflag.and.checkvarv(k))
end do
end do
t(3)=OMP_GET_WTIME()
! Can we enter this loop safely?
!$omp do schedule(dynamic,1)
do i=1,arraysize
if(array(i).eq.0) error=1
enddo
!$omp end do nowait
write(100+iam,*) 'times (write,loop) : ',t(2)-t(1),t(3)-t(2)
write(100+iam,*) 'times (total) : ',t(3)-t(1)
!$omp end parallel
write(*,*) "Test (must be 0): ",error
write(200,*) array-1
deallocate(array)
deallocate(checkvarv)
end
--------------------------------------------------------------------------------


