Vectorization and segmentation fault

Vectorization and segmentation fault

I have a problem with an old software that used to work with ifort 11 and does not anymore with recent versions.

This software is written in fortan 77 and uses an old trick to manage its memory. This trick leads to the arrays out of their bounds so don’t be shocked!

The idea is, at the beginning of the execution to allocate a big array with a C malloc and to calculate the distance from this array and a reference array (called refarr in the following). To get data with the good type in the allocated array, an equivalence statement is used.

 

In this software,  a loop always return a segmentation fault when it is vectorized and I don’t get why. Here is a simplified source code :

      subroutine mysub(datpos,n)
   
c     datpos = data position in the allocated array
 
      implicit none
   
   
      integer*8 DIST ! distance of the reference array to the allocated array
      integer   datpos,n
   
      integer   refarr
      integer*8 adress_arr(1)
      integer   anint,i,j,jj(n)
         
      COMMON/MYCOM/DIST,refarr(2)
   
   
c      with this statement, refarr, adress_arr are at the same adress
      equivalence (refarr(1),adress_arr(1))
    
c    some code her
 
      anint = 0
   
      do i=4,n !or anything else here
        jj(i)=refarr(adress_arr(DIST+datpos)+1+i)
        if(refarr(adress_arr(DIST+jj(i))+3).EQ.4)THEN
          anint = 1
        ENDIF
      enddo
   
c    more code her
 
      return
      end

 

With ifort11, this loop was not vectorized and it worked. I obviously found the solution to use the NOVECTOR directive but it does not explain the problem. I found another solution : if I declare a jj array and change the loop this way, the software works (I also needed to use the VECTOR ALWAYS directive because the optimizer says that the vectorization of the modified loop seem unefficient) :

      do i=4,n !or anything else here
        jj(i)=refarr(adress_arr(DIST+datpos)+1+i)
        if(refarr(adress_arr(DIST+jj(i))+3).EQ.4)THEN
          anint = 1
        ENDIF
      enddo

 

Does anyone have an idea of what is the problem ?

23 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

I think you've omitted important information.

What are actually the differences between the source which fails under vectorization and the one which works?

I suspect an actual running reproducer may be needed.

What are the compilation options?  I have a case which fails sometimes under ifort 14.0.1 when vectorized for SSE4 but is OK when vectorized for SSE2 or AVX.

"recent versions" includes some buggy ones.  Please use current updates of 13.1 or 14.0.

I don't know why you call this Fortran 77.  integer*8 and implicit none were extensions, not covered in the standard. For integer*8 it's still the case, although it has become more widely available.  anint was already a standard intrinsic, which of course is removed from visibility by your local declaration (if the compiler doesn't get confused).  By combining such questions, you may get into untested territory.

Sorry, the buggy loop is wrong in my first post. It is in fact :


	      do i=4,n !or anything else here

	        j=refarr(adress_arr(DIST+datpos)+1+i)

	        if(refarr(adress_arr(DIST+j)+3).EQ.4)THEN

	          anint = 1

	        ENDIF

	      enddo

For the anint issue, the original loop uses a "LOOA" name for this integer... I just tried to make it easier to read and did not pay attention to the fact that anint is an intrinsic.

The compilation options are "-O3 -unroll -xSSE2". I use ifort 14.0.1.106

I call it fortran 77 just to precise that sadly, I can't avoid to use those fantastic equivalence statements

EQUIVALENCE is in Fortran 2008.

Steve - Intel Developer Support

address_arr has dimension of 1, refarr has a dimension of 2, address_arr(1) and refarr(1) are equivalence in COMMON/MYCOM/

Therefore, your only assurance is that there is sufficient memory for two integer*8's starting at the (same) address as the (1)'th index of each array.

ergo:

n must be .LE. 5 (not "anything else here")
Technically DIST+datapose (when n is 4 or 5) must == 1, although ==2 also assures a valid address (due to refarr having dimension of 2)
The contents of address_arr(1:2) must be -4:-3 when n==4, or, -4 when n==5 (validity for j= line)
j must be -4:-3 when n==4, or, -4 when n==5
DIST must be 3 when n==4, or,5 when n==5
if( statement requires DIST+j to be 1 or 2, therefore n cannot be 4 (n must be 5)
With requirement of n==5 i has values 4,5
address_arr(1:2) must be [-4,-4]

if statement becomes if(refarr(-4+3).EQ.4)

which is an invalid index of refarr to index(-1)

GIGO

IOW Lack of crash .OR. desired vectorization is not a proof of valid code.

I agree with TimP that a proper example of code may help to clear things up.

Jim Dempsey

www.quickthreadprogramming.com

Note, the segmentation fault is likely a result of the COMMON/MYCOM/ segment being located at the start of a virtual memory page boundary and then a subsequent reference to refarr(-1) addressing non-existent memory causing a page fault. Had /MYCOM/ not been located at the start of page could possibly result in no page fault for invalidly indexed refarr(-1). IOW GIGO would not crash. Though you continue to run with GO (garbage out).

Jim Dempsey

www.quickthreadprogramming.com

Reading at the answer, i realize that I indeed should provide a proper example of code... I'll come back in 1 day or 2 with it !

Sorry for the delay but I finally managed to reproduce the problem with a simple program.

As I told in the previous posts my program uses some tricks to add a kind of a dynamic allocation in fortran77. The idea is to perform a big C malloc and to calculate the distance from the allocated array to a reference array.

In this software, a loop works correctly when it is not vectorized but returns a segmentation fault when it is and I can not understand why.

The main program I used to reproduce the bug is simple: It’s 3 call, the first to the C memory allocation (initmem), the second to some Fortran subroutine used to initialize the content of the allocated array (setmem) and the last one contains the buggy loop (lxcall):

      program test

	      implicit none

	     

	c  The 3 following statements are to store the position of the allocated data  

	c distance of the reference array to the allocated array

	      integer*8 DIST

	c reference array

	      integer   refarr

	c      with this statement, refarr, adress_arr are at the same adress

	      COMMON/MYCOM/DIST,refarr(2)

	c number of integers allocated     

	      integer*8 size

	c     Some integers used to reproduce the bug

	      integer tvcal 

	c     Initialise the dynamic memory      

	      size=10000000

	      call init_mem(size,refarr,DIST)

	c     To be able to use more than 3Go memory, adresses are INTEGRE*8 based     

	      DIST = DIST / 2

	      tvcal = 1

	      call setmem(tvcal)

	      call lxcall(tvcal) 

	      end

The C initmem subroutine is ther :


	/*Initialisation of the memory. Allocation of a size*sizeof(int) array

	 and calculation of the distance (in number of integer from *ref to

	 the allocated array */

	void init_mem_(long*size, int*ref, long*dist){

	  int*allocated_array ;

	  allocated_array = (int*) malloc(*size * sizeof(int)) ;

	  /*Calculation of the distance between allocated array and ref*/

	  *dist =  allocated_array - ref ;

	  printf("Calculated distance %ld n",*dist) ;

	}

The 2 other subroutines trigger the spam filter so... I attach them to this post.

The setmem subroutine is used to initialize the test. The lxcall contains the buggy loop. I dupplicated this loop and used the NOVECTOR directive to inhibate the vectorization.

The makefile is also in atachement.

When I use the test, I get the following prints :

"Calculated distance 11743860906594
 I am here
 I can come here
forrtl: severe (174): SIGSEGV, segmentation fault occurred"

Showing that when the loop is not vectorized, the source code works and when it is vectorized, it works.

Anlagen: 

I presume this is intended for IA-32 not Intel64, since your code doesn't look safe for 64 bit pointers?  But it seemed to work for me on IA-32 with the 14.0.1.106 compiler, without obvious errors.

$ icc -c -O3 -unroll -xsse2 init_mem.c                    

$ ifort -O3 -unroll -xsse2 -X -static -vec-report2 main.f init_mem.o lxcall.f setmem.f
lxcall.f(25): (col. 7) remark: loop was not vectorized: #pragma novector used
lxcall.f(35): (col. 7) remark: LOOP WAS VECTORIZED
$ ./a.out
Calculated distance -347899472 -1256005624 135592264       (I added allocated_array and ref to the printf)
            3947067824
 I am here
 I can come here
 But here not
$

As I'm sure you're aware, there are much easier ways to do dynamic memory allocation in modern Fortran.

If you are compiling on Intel64 as 64-bit application, then your C helper might require changing "long*" to "intptr_t*". This will assure that the sizeof the argument pointed to is the size of a pointer on the bitness of the compiled code.

Many C compilers use 32-bit long. Same issue with the "int*", this should be "intptr_t*".

Jim Dempsey

www.quickthreadprogramming.com

I do realize that there are much better way to dynamically allocate memory but... I have something like 1 000 000 lines of legacy codes relying on this memory trick and people won't let me rewrite them !

Your posts gave me some ideas. I still do not have the solution but I think I'm closer to it !

I decide to get rid of the C part and to investigate on the behaviour depending on the memory adress. My main becomes :

      program test

	      implicit none

	     

	c  The 3 following statements are to store the position of the allocated data   

	c       

	c distance of the reference array to the allocated array

	      integer*8 DIST , DIST2 , DIST3

	      integer*8 size

	c reference array

	      integer   refarr

	     

	c      with this statement, refarr, adress_arr are at the same adress

	      COMMON/MYCOM/DIST,refarr(2)

	c     

	c     Arrays to use     

	c

	c     4 300 000 000 > 2^32 and 4 200 000 000 < 2^32

	      integer   alldata2(4 300 000 000)

	      integer   alldata3(  100 000 000)

	     

	c number of integers allocated     

	c

	c     Some integers used to reproduce the bug

	      integer tvcal

	     

	     

	c     Calculation of the data distances      

	      DIST2 = LOC(alldata2) - LOC(refarr)

	      DIST3 = LOC(alldata3) - LOC(refarr)

	c

	c     To be able to use more than 3Go memory, adresses are INTEGRE*8 based     

	      DIST2 = DIST2 / 8

	      DIST3 = DIST3 / 8

	      write(*,*)'Distance',DIST2,DIST3

	c     

	c     Chose the array to use changing the DIST value     

	      DIST = DIST3

	c     

	      tvcal = 1

	      call setmem(tvcal)

	      call lxcall(tvcal)
      end

To make this work I have to use -mcmodel=large option but except for that the rest is the same

Changing the size of "alldata2", I can change the position in memory of "alldata3". When alldata2 size is over 2^32, the code fails if it is below 2^32 it works. That leads me to the conclusion that the vectorization must assume that the indexes used in the loop of the lxcall subroutine are integer*4.

To check this, I probably should take a look at the assembly code (and hope to understand it !) but I did not find the options that I can use to get it.

I found the options to generate the assembly code and I think I have found the buggy part. The whole assembly code is in attachement.

The problem comes indeed from the fact that the indexes of the array are calculated on 32bits integer. Indeed, I find in the assembly code in attachement :

movslq    %r9d, %rbx                                    #line 233
movq      mycom_(,%rbx,8), %xmm4             # line 251

The problem is that considering the position of the data in the memory, the %rbx value should be greater than 2^32 but we see that it can not be as it was calculated in %r9d which is a 32bits register.

I added a print of rbx content before the segfault and saw that the value was "-2144967028" which corresponds to the first (DIST+tvcal)  -  2^32

And now i'm wondering : is this a compiler bug or should I use some additional options with ifort to take into account that the index that I use can be greater than 2^32?

Anlagen: 

AnhangGröße
Herunterladen lxcall.txt23.41 KB

Thanks for the new test case and analysis.

This looks to me like a bug. I have been able to construct a small test case that reproduces a similar problem, without using any fancy addressing tricks or large, negative offsets or indices. The problem seems to be that when the index calculation involves a mixture of integer*4 and integer*8, the compiler doesn't realize that it needs to perform the calculation in 64 bits. If I declare all the integers in the index calculation to be integer*8, then both my small test case and your example seem to work. (Since 2**32 x 8 bytes = 32 GB, you need a system with substantially more memory than this).

So try            integer*8 lcal,looa,ical,pv,next      in lxcall

I was able to get away without declaring tvcal as integer*8, but you'd probably want to do that too, or copy it into an integer*8 local, to be safe. I'll submit this to the compiler developers, and if they agree that it's a bug, we'll get it fixed. But I hope that in the meantime, you can use the above workaround. It's probably not a bad general precaution to use integer*8 for any integer that might be used as part of an address calculation for a program that needs 64 bit addressing.

There's of course no reason to force vectorization of the loop in your example. But I suppose the real application has loops with this construct that also have plenty of additional work that makes vectorization worthwhile.

FWIW,  long ago in a previous profession, I used to work with large applications that used exactly this style of memory allocation and management. The good side is that all data was always in scope. The bad side is that bounds checking is close to impossible, and it's very hard to debug when one piece of data overwrites another, especially since different data types can be all mixed together. I don't miss it.

Martyn,

Can you undo your integer*8 edit (revert to buggy code) and then add "_8" to the array declarations

integer*8 adress_arr(1_8)
...
COMMON/MYCOM/DIST,refarr(2_8)

Does the index calculation misbehave?

FWIW

Assume a program were written using the "modern way" any they used an allocatable array.
If the index error were induced by mixing integer*8 and integer*4 in an expression, I would expect this error to appear here too. Example:

Array(BigIndex+1)

Where BigIndex is integer(8), and the 1 is by default integer(4).

Jim Dempsey

www.quickthreadprogramming.com

Hi Jim,

            I already tried adding _8 to all integer constants, and it didn't help. This isn't a general problem of combining 32 bit and 64 bit integers, or it would have been seen long ago. I also don't think there's a problem in adding literal constants to an integer*8 variable, even though, as you imply, literals default to integer*4 in Fortran. The issue also seems specific to the vectorizer. I think I would expect allocatable arrays to behave in the same way as static arrays, though I haven't tested, since I don't think you can equivalence allocatable arrays. The context isn't quite the same, though, since you don't need -mcmodel=medium if the only large arrays are dynamically allocated.

            Feel free to test any combination that you suspect. I'm attaching my smaller test code, in case that is useful. It does, though, require >>32GB (=2**32 x 8 bytes) in order to execute.

Program test_loop64
  integer*4, parameter :: N=1000 
  integer*8, parameter :: two32 = 2**32
  integer*4 i, pv, looa    ! works if integer*8
  integer*8 DIST
  integer*4 refarr (two32+N)
  integer*8 adress_arr(two32+N)
      
  COMMON/MYCOM/DIST, address_arr
  equivalence (refarr(1),adress_arr(1))
   
  refarr    (      1:      N) = (/(i,i=1,N)/)
  refarr    (two32+1:two32+N) = (/(i,i=1,N)/)
  adress_arr(two32+1:two32+N) = (/(i,i=1,N)/)
  DIST = two32
  looa  = 0
 
 !dir$ vector always
  do i = 1, N/2
    pv = refarr( adress_arr(DIST + i))
    if(refarr( adress_arr(DIST+pv) + DIST + 3 ).eq.4)looa = 1
  enddo
  write(*,*) pv, looa   

end program test_loop64

$ ifort -O2 -mcmodel medium -traceback -vec-report2 test_loop64.f90; ./a.out
test_loop64.f90(12): (col. 3) remark: LOOP WAS VECTORIZED
test_loop64.f90(13): (col. 3) remark: LOOP WAS VECTORIZED
test_loop64.f90(14): (col. 3) remark: LOOP WAS VECTORIZED
test_loop64.f90(14): (col. 3) remark: loop was not vectorized: vectorization possible but seems inefficient
test_loop64.f90(19): (col. 3) remark: LOOP WAS VECTORIZED
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source     
libintlc.so.5      00002B9CCBA5A229  Unknown               Unknown  Unknown
libintlc.so.5      00002B9CCBA58BA0  Unknown               Unknown  Unknown
libifcore.so.5     00002B9CCA6ED33F  Unknown               Unknown  Unknown
libifcore.so.5     00002B9CCA654D7F  Unknown               Unknown  Unknown
libifcore.so.5     00002B9CCA665F83  Unknown               Unknown  Unknown
libpthread.so.0    0000003F75A0F500  Unknown               Unknown  Unknown
a.out              0000000000400C2C  MAIN__                     21  test_loop64.f90
a.out              0000000000400846  Unknown               Unknown  Unknown
libc.so.6          0000003F7521ECDD  Unknown               Unknown  Unknown
a.out              0000000000400739  Unknown               Unknown  Unknown
 

Sorry to ask only now but finally has this been registered as a bug and if yes, could I get the number just to check when it will be fixed.

Thanks

Now that I see your example,

 integer*8, parameter :: two32 = 2**32

May be an issue as the result may be 0 as 2_4**32 exceeds the capacity of integer*4

Then refarr(two32 + N) becomes refarr(N), same with address_arr

Jim Dempsey

www.quickthreadprogramming.com

Zitat:

Matthieu B. schrieb:

 

Sorry to ask only now but finally has this been registered as a bug and if yes, could I get the number just to check when it will be fixed.

 

Yes, this was registered as a bug in January, internal ID  dpd200252841.  It has been worked on and the fix is targeted for the next major version of the compiler. Thanks for asking.

 

Version 15.0 of the Intel Compiler, contained in Intel Parallel Studio XE 2015, has just been released and contains a fix for this issue.

I am surprised : I reproduce the exact same issue with the 15.0 at least with the one the IT guy installed. If I use ifort -V I get :

Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.090 Build 20140723

 

You are right. The issue was reported internally as fixed in 15.0, and I thought I'd tested an earlier 15.0 compiler, but I can definitely still reproduce the problem now, using the compiler you quote. It is also not fixed in the compiler update which will be coming soon. I've already started following up. I'll post when there's something to report.

I'm sorry about this, but thanks for reporting.

I didn't imagine it; I did test an early 15.0 compiler and the fix did work. However, a subsequent fix to a different issue caused a regression in this one, and this wasn't detected until after the product release, due to a technical problem (relating to the memory footprint). A new fix for both issues is in progress. It will not be in the first update to the 15.0 compiler, which will be posted shortly, but it should be in the second update sometime in the new year.

Apologies for the thrash on this.

Just confirming that this issue is indeed fixed in update 2 to the 15.0 compiler, 15.0.2.164, which is available via the Intel Registration Center.

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen