Compiler bug (?), code works on ifort 11.1 and 12.0, broken on 12.1.3 and 13.1.0

Compiler bug (?), code works on ifort 11.1 and 12.0, broken on 12.1.3 and 13.1.0

Hi all, 

The following trivial program works fine with gfortran, ifort 11.1 20100806 and ifort 12.1.0 20110811, but crashes with segmentation fault if compiled with ifort 12.1.3 20120212 or ifort 13.1.0 20130121:

 program bad implicit none type x integer(kind=4), dimension(128) :: a end type x integer, parameter :: s = 126 integer :: istat type(x),dimension(:,:,:),allocatable :: N allocate(N(0:s+1,0:s+1,0:s+1),stat=istat) if(istat /= 0) then print *, "allocation failed" stop end if N(0:s+1,0:s+1, 0) = N(0:s+1,0:s+1,s) N(0:s+1,0:s+1,s+1) = N(0:s+1,0:s+1,1) deallocate(N) end program bad 

If the internal array size in the type definition is changed to anything less than 128 elements, everything works fine. Looks like a magic number of 2^30 (128*128*128*128*4) is somehow involved in this. Please advise what to do. Yes, I can re-write the corresponding loops manually and then the code starts to work, but that completely misses the point...

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Try compiling with -heap-arrays. One or both assignments are creating temporary copies. I see the same behavior in 12.1.

I have asked the developers to see if the compiler can be smarter about this and filed issue DPD200241574.

Steve - Intel Developer Support

You are right, apparently it is the stack system limit size and not the compiler version was the cause of fail/pass behaviour (older compilers were installed on another system with no stack limit, new on a system with 8Mb stack limit as default). Resetting stack limit to unlimited with ulimit -s unlimited also mitigates the problem. But why are temporary arrays allocated in this case anyway? There is no intersection between array slices involved, so this should be compiled without recourse to tempoarary arrays, no? And what is the size of the temporary arrays involved? In our production cases N array may take more than 2/3 of available system RAM.

I agree that no temporary is required - but the compiler may need additional analysis to determine that. I have asked the developers to add this. The temporary would be the size of the array slice, which could be very large. There is also the added time in copying the data.

In the case you show, no temporary is required. In other cases, if the compiler cannot determine that there is no overlap, it will construct a temporary.

Steve - Intel Developer Support

I experimented with this a bit more and found that it is the custom type that causes inefficient code to be generated. This is the minimal test case to demonstrate the problem:

program test

  implicit none

  type x

      integer :: a

  end type
  integer, parameter :: s = 512

  type(x), dimension(s,s,s) :: bad

  integer, dimension(s,s,s) :: good
  bad(:,:,1) = bad(:,:,2)

  good(:,:,1) = good(:,:,2)

end program test

If you check the assembler output, "bad" array is copied using a temporary buffer, "good" is copied in place, as expected.

I didn't mean to suggest that the code was "improved" with -heap-arrays, only that you'd avoid the segfault. The developers are looking at the case now - you're right that the use of the derived type is important.

Steve - Intel Developer Support

We have improved the compiler's overlap detection to properly handle this case. The change will appear in a compiler version later this year.

Steve - Intel Developer Support

Leave a Comment

Please sign in to add a comment. Not a member? Join today