OpenMP bug?

OpenMP bug?

Hi!
the following program segfaults after being compiled with ifort -openmp (v. 11.1) on both macs and Linux

program test
implicit none
integer,allocatable,dimension(:)::ivar
integer i,itot
allocate(ivar(5000))
itot=0
!$omp parallel do default(none) private(ivar) shared(itot)
do i=1,5000
ivar(i)=i
!$omp atomic
itot=itot+ivar(i)
enddo
!$omp end parallel do
deallocate(ivar)
write(6,*) itot
end

the same works if compiled with gfortran and pgi.
Any ideas?
Best regards,
Pier

25 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

This normally would be done by reduction. I suppose you have a reason for trying atomic instead. Did you check whether the compiler is attempting vectorization, and consider whether that might be more efficient, supposing you have selected vectorization only for ifort and not for the other compilers?

The example is chosen just to highlight the problem and I am in no way saying that there are no better ways to perform the implemented computation.
In fact the example is an extreme simplification of a much more complex routine (in particular a hierarchical quadrature routine which is part of a partial electrical equivalent circuit code).
I think, although I'm not a big OpenMP expert, that ifort is incorrectly handling something which is allowed by the OpenMP 3.0 specification.
The code gets compiled with an absolutely minimal command line (ifort -openmp test.f90) and the same minimal options are used for gfortran (gfortran -fopenmp test.f90) and for pgi. The same code runs also without problems with the aix/xlf combination.
I would be extremely grateful if somebody could tell me if the shown code is "wrong" (in the sense that it is not standard-compliant) and other compilers work just by luck or if there is indeed a problem with ifort 11.1
Thanks a lot!

No, you didn't use "same minimal options" for gfortran. gfortran defaults to -O0, and doesn't invoke vectorization as a consequence of O level until -O3 (and that only in relatively recent versions), while ifort defaults to -O2 which includes auto-vectorization.
atomic is defined as updating a single memory location in a thread safe manner, while, under vectorization, you may be updating 4 or 8 memory locations.
I don't know if that is the answer, but it seems you are assuming that atomic will work the same as critical or reduction.

Sorry, my fault. However, using "ifort -O0 -openmp test.f90" does not change things, still segfaults.
(conversely "gfortran -O3 -ftree-vectorize -fopenmp test.f90" works)
Furthermore, this even more simplified program also crashes:

program test
implicit none
integer,allocatable,dimension(:)::ivar
integer i,itot
allocate(ivar(5000))
itot=0
!$omp parallel do default(none) private(ivar) shared(itot)
do i=1,5000
ivar(i)=i
enddo
!$omp end parallel do
deallocate(ivar)
write(6,*) itot
end

so the "atomic" part is not the cause. What is causing problems is the "allocatable" ivar. If that is substituted with a statically allocated array everything works.

If allocate is a problem, one would suspect a stack size limit, although that's hard to understand for such a small case.

I don't think it's a stack limit problem for the following reasons:
1) the problem occurs even if 5000 is modified to 5 and OMP_NUM_THREADS=1
2) the sequential version (without -openmp) runs ok (I guess these two case would allocate the same amount of memory through the same mechanism)

Can some OpenMP expert please tell me if the following code is "supposed to work"?

program test
implicit none
integer,allocatable,dimension(:)::ivar
integer i,itot
allocate(ivar(5))
itot=0
!$omp parallel do default(none) private(ivar) shared(itot)
do i=1,5
ivar(i)=i
!$omp atomic
itot=itot+ivar(i)
enddo
!$omp end parallel do
deallocate(ivar)
write(6,*) itot
end

Appears to work same as reduction, when the size of the problem is increased. end parallel do is redundant. It's somewhat curious that it "doesn't work" except at 1 thread, at your reduced size.

how can I submit a bug report? (at this point i think it is a bug and I think that the piece of code is so small that looking at it should be feasible)

jimdempseyatthecove's picture

program test
implicit none
integer,allocatable,dimension(:)::ivar
integer i,itot
allocate(ivar(5000))
itot=0
!$omp parallel do default(none) shared(ivar,itot) reduction(+:itot)
do i=1,5000
ivar(i)=i
! remove !$omp atomic
itot=itot+ivar(i)
enddo
!$omp end parallel do
deallocate(ivar)
write(6,*) itot
end

Jim Dempsey

www.quickthreadprogramming.com

ifort -openmp test2.f90
test2.f90(7): error #6761: An entity cannot appear explicitly in more than one clause per directive except that an entity can be specified in both a FIRSTPRIVATE and LASTPRIVATE clause. [ITOT]
!$omp parallel do default(none) shared(ivar,itot) reduction(+:itot)

furthermore, reduction is not an option since in the large code (of which this is just a toy mockup) the pieces coming from the difference threads cannot be assembled by a simple sum...

Most importantly: other compilers (gfortran,pgi,mpxlf) work on the same code, so either there is a bug in ifort or the code is not standard-compliant and the other compilers work just by chance

jimdempseyatthecove's picture

The sample I produced from your code did two things different.

1) the array in the loop was referenced slice-wise by the parallel do loop. Therefore the array was made shared. This saved stack space and more importantly the implicit array copy.

2) a reduction variable was used to reduce the number of atomic to the number of threads verses the number of iterations. Lacking other information about your code, a reduction variable was appropriate.

Including the reduction variable in the PRIVATE/SHARED clause was an error on my part.

RE: 2)

Unless the code in the loop is very large compared to the overhead of an ATOMIC you should try to avoid ATOMIC by use of reduction or code equivilent to the reduction clause (thread local storage or a mailbox collated at end of/after loop).

Jim

www.quickthreadprogramming.com

Dear Jim,
let me first thank you sincerely for helping me with this issue. I think that maybe due to problems in expressing myself precisely in english my question might not have been clear.
Let me try to formulate my question again:

I would like to know if the code shown below is legal, not if it as an efficient way of doing things or if there are are ways of obtaining the same result. Only if the code is legal.

If the code is legal ifort should run it and if it doesn't it means that ifort has a bug.
If on the other hand the code is not legal than I wonder why other compilers handle it as I would expect, but this is another matter.
Thanks again,
Pier

program test
implicit none
integer,allocatable,dimension(:)::ivar
integer i,itot
allocate(ivar(5))
itot=0
!$omp parallel do default(none) private(ivar) shared(itot)
do i=1,5
ivar(i)=i
!$omp atomic
itot=itot+ivar(i)
enddo
!$omp end parallel do
deallocate(ivar)
write(6,*) itot
end

jimdempseyatthecove's picture

What appears to be happening on my copy of IVF (11.0.066)

all threads are receiving a new unallocated array descriptor for private(ivar)

When using firstprivate(ivar) this appears to be (for arrays)somewhat equivilent to shared.

IOW different descriptors appear to be used, pointing to same memory locations.

program test
use omp_lib
implicit none
integer,allocatable,dimension(:)::ivar
integer i,itot, k
k=123
allocate(ivar(5))
write(*,*) loc(ivar(1)), loc(ivar), loc(itot), loc(k)
itot=0
!$omp parallel do default(none) firstprivate(ivar, k) shared(itot)
do i=1,5
!$omp critical
if(allocated(ivar)) then
  write(*,*) "allocated"
    write(*,*) omp_get_thread_num(), loc(ivar(1)), loc(ivar), loc(itot), loc(k)
    deallocate(ivar)
else
  write(*,*) "not allocated"
    write(*,*) omp_get_thread_num(), loc(ivar), loc(itot), loc(k)
endif
!$omp end critical

if(allocated(ivar)) then
  ivar(i)=i
!$omp atomic
  itot=itot+ivar(i)
endif
enddo
!$omp end parallel do
if(allocated(ivar)) deallocate(ivar)
write(6,*) itot
end
 
    3635664     3635664     1244676     1244672
 allocated
           0     3635664     3635664     1244676     1242080
 not allocated
           0           0     1244676     1242080
 not allocated
           1     3635664     1244676    10418784
 not allocated
           0           0     1244676     1242080
 not allocated
           1     3635664     1244676    10418784

look at the 2nd argument in the write statements (3635664/0)

Jim Dempsey

www.quickthreadprogramming.com

Thanks for the continued help!
I would like somebody from Intel would comment on this issue, which I continue to feel is a bug (comments on the openmp.org forum also indicate that the code is correct).
Running following code shows, as you mentioned above that ivar is never allocated!
gfortran and other compilers happily allocate and execute the code correctly...

program test
implicit none
integer,allocatable,dimension(:)::ivar
integer i,itot
allocate(ivar(5))
itot=0
!$omp parallel do default(none) private(ivar) shared(itot)
do i=1,5
if(allocated(ivar))then
write(6,*) 'allocated'
ivar(i)=i
!$omp atomic
itot=itot+ivar(i)
else
write(6,*) 'not allocated'
endif
enddo
!$omp end parallel do
deallocate(ivar)
write(6,*) itot
end

Finally I found the solution!

this does not work with ifort and works with all other compilers I use:
integer,allocatable,dimension(:)::ivar
allocate(ivar(5))
!$omp parallel do default(none) private(ivar) shared(itot)
...
!$omp end parallel do
deallocate(ivar)

this works with all compilers I use:
integer,allocatable,dimension(:)::ivar
allocate(ivar(5))
!$omp parallel do default(none) firstprivate(ivar) shared(itot)
...
!$omp end parallel do
deallocate(ivar)

so firstprivate instead of private for allocatable variables... without firstprivate ivar does not get allocated by ifort... if my understanding of the standard is correct firstprivate should not be required if I am not interested in the contents of ivar whe entering the parallel do...

I hope Intel will correct this!

jimdempseyatthecove's picture

Pier,

Your description of first private is not correct

program test
use omp_lib
implicit none
integer,allocatable,dimension(:)::ivar
integer i,itot, k
k=123
allocate(ivar(5))
ivar = -1
write(*,*) loc(ivar(1)), loc(ivar), loc(itot), loc(k)
itot=0
!$omp parallel do default(none) firstprivate(ivar, k) shared(itot)
do i=1,5
!$omp critical
if(allocated(ivar)) then
  write(*,*) "allocated"
    write(*,*) omp_get_thread_num(), ivar(1)
    ivar(1) = omp_get_thread_num()
else
  write(*,*) "not allocated"
    write(*,*) omp_get_thread_num(), loc(ivar), loc(itot), loc(k)
endif
!$omp end critical

if(allocated(ivar)) then
! ***  ivar(i)=i
!$omp atomic
  itot=itot+ivar(i)
endif
enddo
!$omp end parallel do
if(allocated(ivar)) deallocate(ivar)
write(6,*) itot
end
     3635664     3635664     1244672     1244668
 allocated
           0          -1
 allocated
           0           0
 allocated
           1           0
 allocated
           0           1
 allocated
           1           0

You are likely having seperate array descriptors pointing to the same memory.
In following the lines after "allocated"

First line (from thread 0)shows the copied-in value of -1
Second line (from thread 0) shows the rewritten value of thread number
Third line (from thread 1) shows value written from thread 0, not different copy made from outside (copy-in), -1

Jim Dempsey

www.quickthreadprogramming.com

Jim, you are right, this is what is happening but the specification says a different thing:

(a) PRIVATE Clause (Section 2.9.3.3, p. 90):
A new list item of the same type is allocated once for each implicit
task in the parallel region, or for each task generated by a task
construct, if the construct references the list item in any statement.
The initial value of the new list item is undefined. Within a parallel,
worksharing, or task region, the initial status of a private pointer is
undefined.

For a list item with the ALLOCATABLE attribute:

  • if the list item is "not currently allocated", the new list item will have an initial state of "not currently allocated";
  • if the list item is allocated, the new list item will have an initial state of allocated with the same array bounds.

(b) FIRSTPRIVATE Clause (Section 2.9.3.4, p. 92):
A list item that appears in a FIRSTPRIVATE clause is subject to the
private clause semantics described in Section 2.9.3.3 on page 89. In
addition, the new list item is initialized from the original list item
existing before the construct.

look at the following:

program test
use omp_lib
implicit none
integer,allocatable,dimension(:)::ivar
integer i,itot
allocate(ivar(6))
ivar(:)=-1
itot=0
!$omp parallel do default(none) firstprivate(ivar) shared(itot)
do i=1,5
if(allocated(ivar))then
write(6,*) 'allocated ',omp_get_thread_num(),ivar(6)
ivar(6)=omp_get_thread_num()
ivar(i)=i
!$omp atomic
itot=itot+ivar(i)
else
write(6,*) 'not allocated'
endif
enddo
!$omp end parallel do
deallocate(ivar)
write(6,*) itot
end

output from ifort:
allocated 0 -1
allocated 0 0
allocated 0 0
allocated 1 0
allocated 1 1

output from other compilers:
allocated 0 -1
allocated 0 0
allocated 0 0
allocated 1 -1
allocated 1 1

if you change the allocation from dynamic to static the output from ifort becomes correct

I keep thinking that ifort has is buggy and/or not specification-compliant with respect to allocatable arrays

Piergiorgio,

Perhaps you have an older version of ifort. I did not see any indication of your specific version in the thread. If you have a version older than what is noted below that appears fixed, could you upgrade your 11.1 compiler to at least the last 11.1 update?

It appears there is a defect with ifort but it also that it has been fixed in the last 11.1 update (11.1.073 - 11.1 Update 7) and our current Fortran Compiler XE 2011 release.

The program cited in Piergiorgio's previous post produces the incorrect results noted with 11.1 (Intel 64) Linux compilers beginning with Version 11.1 Build 20100414 Package ID: l_cprof_p_11.1.072 and going back as far as Version 11.1 Build 20091130 Package ID: l_cprof_p_11.1.064.

The program produces correct results with the following compilers:

Intel Fortran Intel 64 Compiler XE for applications running on Intel 64, Version 12.0.0.084 Build 20101006(l_fcompxe_2011.0.084)

Intel Fortran Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100806 Package ID: l_cprof_p_11.1.073 (a.k.a. 11.1 Update 7)

Confirmed incorrect results:

$ ifort -V -openmp u78716.f90
Intel Fortran Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100414 Package ID: l_cprof_p_11.1.072
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.

Intel Fortran 11.1-2739
GNU ld version 2.17.50.0.6-5.el5 20061020

$ export OMP_NUM_THREADS=2
$ OMP_NUM_THREADS=2
$ ./a.out
allocated 0 -1
allocated 0 0
allocated 0 0
allocated 1 0
allocated 1 1
15

Correct results:

$ ifort -V -openmp u78716.f90
Intel Fortran Intel 64 Compiler XE for applications running on Intel 64, Version 12.0.0.084 Build 20101006
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.

Intel Fortran 12.0-1176
GNU ld version 2.17.50.0.6-5.el5 20061020

$ export OMP_NUM_THREADS=2
$ OMP_NUM_THREADS=2
$ ./a.out
allocated 0 -1
allocated 0 0
allocated 0 0
allocated 1 -1
allocated 1 1
15

$ ifort -V -openmp u78716.f90
Intel Fortran Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100806 Package ID: l_cprof_p_11.1.073
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.

Intel Fortran 11.1-2755
GNU ld version 2.17.50.0.6-5.el5 20061020

$ export OMP_NUM_THREADS=2
$ OMP_NUM_THREADS=2
$ ./a.out
allocated 0 -1
allocated 0 0
allocated 0 0
allocated 1 -1
allocated 1 1
15

Dear Kevin,
I am running under Mac OS X 10.6.5

ifort -V -openmp test.f90

Intel Fortran Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100203 Package ID: m_cprof_p_11.1.084
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.
Intel Fortran 11.1-2692

export OMP_NUM_THREADS=2
./a.out
allocated 0 -1
allocated 0 0
allocated 0 0
allocated 1 0
allocated 1 1

so the Mac version appears to be older in date (20100203) but newer in version (084) with respect to Linux...
I will try to upgrade and see if the problem persists
Thanks a lot for having investigated the matter!

Thank you for the clarification. I confirmed it produces incorrect results on Mac OS X 10.4.x with Xcode 3.2.2 andifort 11.1.084 and that it also appears fixed in ifort 11.1.089.

I also confirmed this is not related to the ifort/icc and Xcode 3.2.2 linker compatibility issue and that -use-asm has no effect. Reproducing this on Linux also confirms that. It appears there was perhaps an OpenMP defect with ifort from what I see. Ido nothave a specific internal report that I can relate the behavior to.

If you can upgrade to at least ifort 11.1.089, I believe that should resolve the issue.

Confirmed the test case behaves with 11.1.089:

$ ifort -V -openmp u78716.f90

Intel Fortran Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100806 Package ID: m_cprof_p_11.1.089
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.

Intel Fortran 11.1-2755
@(#)PROGRAM:ld PROJECT:ld64-97.2

$ export OMP_NUM_THREADS=2
$ OMP_NUM_THREADS=2
$ ./a.out
allocated 0 -1
allocated 0 0
allocated 0 0
allocated 1 -1
allocated 1 1
15

jimdempseyatthecove's picture

>>if the list item is allocated, the new list item will have an initial state of allocated

Correct, I simply provided a reproducer that showed more detail regarding the problem.

Regarding "unable to use reduction" (should be posted as a different thread)

It would be nice if the OpenMP standard has (maybe it does have) a means to determine the iteration control variables. The purpose being to be able to determin: first iteration, last iteration, number of iterations (or stride).

!$omp parallel do
do i=iFrom, iTo, iStep

The current specification makes variable i local to the parallel region
I am suggesting that it would be nice if the programmer could determine theslices equivilent to the iFrom, iTo

Since many loops use literal constants a declaration of some sort needs to be provided

**** pseudo code with new feature recommendation
!$omp parallel do doprivate(iFrom, iTo) ...
do i=1,100
! i is currently private and has iteration point for this thread
! iFrom is private and has iteration start for this thread
! iTo is private and has iteration endfor this thread
...
if(i .eq. iTo) then
!$omp critical
...
!$omp end critical
endif
end do

Note, by providing a new type of private, you can also determine the iteration pointat any nest level (assuming variables of those nest levels are visible).

Jim Dempsey

www.quickthreadprogramming.com

For the record, there was an earlier report of this issue opened with Development (internal tracking id noted below) that was fixed in the 11.1 update 7 (11.1.089 - Mac OS, 11.1.073 - Linux)

(Internal tracking id: DPD200150643)

Login to leave a comment.