openMP bug in intel 11.1.064 ?

openMP bug in intel 11.1.064 ?

Hi,I experienced a problem (bug?) with the openMP code attached below. I tested the code on different architectures: dual socket quad core Intel Xeon E5450 and dual socket quad core AMD opteron experiencing the same problem.I tried to be as simple as possible in the test code. I would write a code which guaranties me to safely enter to the last loop (see code) without issuing omp barriers. (...waiting for thread subteams in next versions of openMP...)I use intel compiler and OMP_NUM_THREADS=8[zorro@pordoi212 ~]$ ifort -VIntel Fortran Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100414 Package ID: l_cprof_p_11.1.072Copyright (C) 1985-2010 Intel Corporation. All rights reserved.When compiled without any other flag than -openmp, it doesn't provide the right answer (*)[zorro@pordoi212 ~]$ ifort -openmp testflush.F90 -o tfint[zorro@pordoi212 ~]$ ./tfintTest (must be 0): 1if I add -C flag (or -g), it works fine[zorro@pordoi212 ~]$ ifort -openmp -g testflush.F90 -o tfint[zorro@pordoi212 ~]$ ./tfintTest (must be 0): 0[zorro@pordoi212 ~]$ ifort -openmp -C testflush.F90 -o tfint[zorro@pordoi212 ~]$ ./tfintTest (must be 0): 0When compiling with pgi 10.3, the code is right (same with gcc)[zorro@pordoi212 ~]$ pgf90 -mp testflush.F90 -o tfpgi[zorro@pordoi212 ~]$ ./tfpgiTest (must be 0): 0If you compile on your environment, you can see that in the wrong case (*), listed times in fort.102 file reports that thread 2 (3 to 7 also in fort.103-107) doesn't wait for thread 1 completion (during checking loop)times (write,loop) : 2.348423004150391E-004 9.536743164062500E-007times (total) : 2.357959747314453E-004In other cases, reported times confirm the right behaviour (threads 2 to 7 wait for thread 1 completion), an exampletimes (write,loop) : 2.3508071899414062E-004 1.000094890594482times (total) : 1.000329971313477Some comments in the code add other details to the problem.Regards,Stefano----------------Here is the code------------------------------------program main implicit none include 'omp_lib.h' logical checkflag integer i,iam,arraysize,auxsl,nth,error,chunk,k double precision t(3) !logical, volatile, allocatable :: checkvarv(:) !same behaviour with volatile attribute logical, allocatable :: checkvarv(:) integer, allocatable :: array(:) arraysize=1000 allocate(checkvarv(arraysize)) checkvarv(:)=.false. allocate(array(arraysize)) array=0 error=0!$omp parallel default(none) shared(error,arraysize,nth,checkvarv,array) private(iam,checkflag,auxsl,t,i,k) iam=OMP_GET_THREAD_NUM() auxsl=0 checkflag=.false.!$omp single nth=OMP_GET_NUM_THREADS()!$omp end single t(1)=OMP_GET_WTIME()! Thread 0 performs some MPI operation.... if(iam.eq.0) call sleep(2)! Misalign thread 1 with respect to other threads when writing to array!$omp do schedule(dynamic,1) do i=1,arraysize if(iam.eq.1.and.auxsl.eq.0) then auxsl=1 call sleep(1) endif array(i)=iam+1 checkvarv(i)=.true. enddo!$omp end do nowait t(2)=OMP_GET_WTIME()! Loops until previous write loop is finished do while(.not.checkflag)!!$omp flush(checkvarv) ! if you uncomment this line, you obtain the right answer with -openmp flag only. checkflag=.true. do k=1,arraysize checkflag = (checkflag.and.checkvarv(k)) end do end do t(3)=OMP_GET_WTIME()! Can we enter this loop safely?!$omp do schedule(dynamic,1) do i=1,arraysize if(array(i).eq.0) error=1 enddo!$omp end do nowait write(100+iam,*) 'times (write,loop) : ',t(2)-t(1),t(3)-t(2) write(100+iam,*) 'times (total) : ',t(3)-t(1)!$omp end parallel write(*,*) "Test (must be 0): ",error write(200,*) array-1 deallocate(array) deallocate(checkvarv)end--------------------------------------------------------------------------------

publicaciones de 5 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Try inserting:

!$omp flush(array)

after

! Can we enter this loop safely

However, instead consider placing

!$omp flush(array)
!$omp flush(checkvarg)

immediately after the

!$omp end do nowait

that fills in the array.

N.B. I specified two seperate flush(..) in hope that the flushes occur as sequenced. Placing both variables(arrays) in the same statement is ambiguous to order. IA36/Intel64 should preserve write ordering however the compiler optimizations might resequence instructions and/or use streaming writes and/or write merging. To check for these conditions, examine the results of your

write(200,*) array-1

If you see any -1's then a write was not issued or bunged up with SSE merge (read, modify (with mask), write).
If you see all 0:nth-1's then the issue is with the flush.

Jim Dempsey

www.quickthreadprogramming.com

Your suggestions doesn't work. I put also them together in the code but it gives me the wrong output. I think the issue is with flush.

On IA32, Intel64 and AMD FLUSH(var)acts asa compiler directive

if var is registered then
if register of var modified then
write to memory
end if
disassociate var from register
end if

The write ordering should be preserved.

Due to all checkvar being set and varified as .true. (indicated by all threads passing the test) this indicates that all threads passed through their slice of

array(i) = iam+1
checkvar(i) = .true.

and thus is indicitive of one or more of

1) cache coherency system not correctly performing write combining
(I seriously doubt this would be true)

2) the allocatable array isn't aligned to at least integer granularity
(I seriously doubt this would be true)

3) compiler optimizations is batching up writes into larger GP register or SSE register then performing merge (read/modify/write) either at start of slice and/or end of slice

4) bad code

Can you produce an assembler listing then attach to reply to this forum?

Jim Dempsey

www.quickthreadprogramming.com

Stefano,

Thanks for the assembly listings. I haven't investigated thoroughly (I'm not from Intel) but I see some flow control problems with your testflush1.s listing. Assuming this was generated from your first message then we see:

..B1.56:                        # Preds ..B1.44 ..B1.50
        xorl      %eax, %eax                                    #52.12
        call      omp_get_wtime_                                #52.12
                                # LOE r13 r12d r14d xmm0
..B1.131:                       # Preds ..B1.56
        movsd     %xmm0, -128(%rbp)                             #52.12
                                # LOE r13 r12d r14d
..B1.57:                        # Preds ..B1.131
        xorl      %eax, %eax                                    #61.12
        call      omp_get_wtime_                                #61.12
                                # LOE r13 r12d r14d xmm0
..B1.132:                       # Preds ..B1.57
        movsd     %xmm0, -104(%rbp)                             #61.12

Two of the omp_get_wtime_ function calls are made back-to-back with no interviening code.
Meaning this code section did not follow the sequence of operations (or equivilent to the sequence of operations)in the original source code.

The write loop for filling in the array and chedkvar values was sequenced properly.

If the code flow for the code around (between) your loops is incorrect then it is possible that you could incorrectly sequence the test for array fill done.

Again, I wish to state I did not examine the code flow further than the first error.

Additional note:

The two omp_get_wtime_ function calls above store into

-128(%rbp) and -104(%rbp)

the third call stores into -112(%rbp)

These three stores are to corrolat to

double precision t(3)

The above array should encompass 3 * real(8)

If -128(%rbp) is t(1), then t(2) should be -120(%rbp), t(3) should be -112(%rbp), t(4) at -104(%rbp)
IOW the array addresses for t are messed up. Now the compiler optimizations could determine that array t is only unsed within this program and each cell referenced is unique (i.e. not passing t(1:3) outside the program), and then take the liberty to move the data around, however the debugger would have a problem in displaying the discontiguous array t.

Jim Dempsey

www.quickthreadprogramming.com

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya