-openmp -O1 bug with 13.1.3.192

-openmp -O1 bug with 13.1.3.192

The following code:

module parallel_mod
type interpol_spline
  real(kind=8), allocatable   ::  x(:), y(:), dy(:), ddy(:) 
  real(kind=8), allocatable   ::  w(:,:)  
end type interpol_spline
type filament
  type(interpol_spline)       ::  x_spline                                
  type(interpol_spline)       ::  y_spline                                
  type(interpol_spline)       ::  z_spline                                
  type(interpol_spline)       ::  s_spline                                
end type
contains
subroutine parallel_test_two()
  implicit none
  type(filament)  :: fil_tmp
  !$omp parallel private(fil_tmp)
  !$omp end parallel
end subroutine
end module parallel_mod
program omp_test
  use parallel_mod
  implicit none
  print *, 'about to test omp'
  call parallel_test_two()
  print *, 'all done'
end program omp_test

When compiled like this:

ifort -O1  -openmp  omp_test.f90

Outputs this at runtime:

 about to test omp
Segmentation fault

My ifort version information:

Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.3.192 Build 20130607

I assume this is an optimizer/OpenMP bug. But I can't find a workaround - any help/suggestions would be great.

15 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I am unable to reproduce this reesult using the same compiler version and the same compile flags on linux.  

I can reproduce it but am not familiar enough with OpenMP to know whether or not this should work. My guess is that it should. I think the issue has to do with the unallocated allocatable components and the private clause.

Steve - Intel Developer Support

Hi Steve,

If you manage to come up with a workaround that would be great.

With private, I always do all the allocation once in the parallel section. The behaviour is not well defined otherwise. I'm pretty sure my reproducer should work. In fact, just changing slightly the make up of the derived types makes it work so it really smells like a bug.

Quote:

Steve Lionel (Intel) wrote:

I can reproduce it but am not familiar enough with OpenMP to know whether or not this should work. My guess is that it should. I think the issue has to do with the unallocated allocatable components and the private clause.

In case it's of interest to anyone, it may be useful to know more about OP's environment.  When I attempted it according to the OP's instructions on 64-bit linux, it died with no segfault, but adding -g -traceback to the build options could produce a segfault.

We have been seeing a great deal of difficulty with allocatable and automatic arrays under -openmp (-O1 and up) with 12.0 through 14.0 compilers, even without involving derived type.  Some of the troublesome cases actually worked with (only) this version of the compiler.

Tim,

Let me know what you'd like to know about the environment. Here are a few things to get started:

$ cat /etc/redhat-release 
Fedora release 18 (Spherical Cow)
$ head /proc/cpuinfo 
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz
stepping : 7
microcode : 0x70d
cpu MHz : 3100.000
cache size : 20480 KB
physical id : 0

On this system, there is still a segfault with flags -g -traceback -O1 -openmp

Do let me know what other information I may be able to provide to help with this. It would be great to have rock-solid OpenMP support.

As a work around, try:

subroutine parallel_test_two()  
  implicit none 
  type(filament), pointer  :: fil_tmp =>  NULL()
  !$omp parallel private(fil_tmp)
  allocate(fil_temp)
  ! allocate(fil_temp%x_spline%x(nnn))
  ! ...
  ! deallocate(fil_temp%x_spline%x(nnn))
  deallocate(fil_temp)
  nullify(fil_temp)
  !$omp end parallel  
end subroutine

Jim Dempsey

www.quickthreadprogramming.com

Jim,

Thanks for this very nice tip. Works fine at compile-time and appears to be good at runtime too.

FWIW, the behaviour of the original example isn't defined by the (current) OpenMP 4.0 spec because of the use of allocatable components.  As a result, the OP's program has one foot in choose-your-own-adventure land  See the list on page 22 of the OpenMP spec. 

A. Rohou,

One more helpful hint. Use firstprivate in place of private. This will copy the NULL value of the pointer into the parallel region. While the proffered above code works, other code may not if it depends on/uses ASSOCIATED to test the validity of the pointer.

There is another old thread relating to this subject.

Jim Dempsey

www.quickthreadprogramming.com

Thanks all. Since IanH points out that what I was expecting is not actually defined by OpenMP 4.0, I think I will do something like this instead:

subroutine parallel_test_two()
  use omp_lib
  implicit none
  type(filament), allocatable  :: fil_tmp(:)
  !$omp parallel shared(fil_tmp)
  !$omp single
  allocate(fil_tmp(omp_get_num_threads()))
  !$omp end single
  !(...)
  !$omp barrier
  !$omp single
  deallocate(fil_tmp)
  !$omp end single
  !$omp end parallel
end subroutinee

I hope that this will be more likely to work, since I'm not expecting the complier to handle any implicit memory allocation of derived types with allocatable components anymore.

[Edit: added barrier before single]

You might want to see if using:

... fil_tmp(myThreadNumber)%... ... 

adds excessive overhead.

Jim Dempsey

www.quickthreadprogramming.com

Quote:

jimdempseyatthecove wrote:

You might want to see if using:

... fil_tmp(myThreadNumber)%... ... 

adds excessive overhead.

Jim Dempsey

Jim,

I'm not sure I know exactly what you mean. In terms of memory, I would have expected this latest workaround to be ~ equivalent to using PRIVATE since there's only one additional array descriptor (fil_tmp(:)), but I don't really understand these things well enough to be sure that's true. On the other hand, perhaps you mean some kind of computing overhead?

If you allocate a shared array of filament, your references are going to be:

iThread = omp_get_thread_num()
...
fil_tmp(iThread)%x_spline%x(i) = fil_tmp(iThread)%x_spline%x(i) + dX

When using the pointer, (or DUMMY with reference):

myFilament%x_spline%x(i) = myFilament%x_spline%x(i) + dX

You remove one array index operation. The compiler may remove this automatically assuming availability of registers (low complexity of code).

If you are "pointer adverse", then consider encapsulating the body of the code and calling with reference to array element

call doWork(fil_tmp(iThread), other, args, here)
...
subroutine doWork(myFilament, other, args, here)
type(filament) :: myFilament
...

If you want, you can use a contains subroutine.

I am not "pointer adversed". The efforts to hide the pointer is more work.

Jim Dempsey

www.quickthreadprogramming.com

Thanks Jim, I know what you mean now. It turns out my code is already "encapsulated" the way you described with Quote:

call doWork(fil_tmp(iThread), other, args, here)
. So in my case, there's no or not much coding overhead.

Leave a Comment

Please sign in to add a comment. Not a member? Join today