Ambiguous error in deallocation on Linux

Ambiguous error in deallocation on Linux

Hi all,

I am pleased to join this very active Intel fortran forum and posting my first issue here.

The problem is related to deallocation of the allocated arrays. The same code works seemingly fine on Mac OS X, but fails to execute successfully on Linux. Herewith I enclose the code which is giving problem. The code is basically for interfacing dstegr, a LAPACK routine.

I debug the code as following:

1) First I set ulimit -s unlimited

2) I compile the code as

ifort lapack_dstegr.f90 test_dstegr.f90 -llapack -lblas -g -traceback -warn all -check all

3) By simple ./a.out, it hangs

4) So I checked with gdb

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

 Linear dimension of the matrix         100

 Program has been run successfully

 Number of eigenvalues:          100

 Selected eigenvalues

  6.473654E-01  3.540588E+00  8.517578E+00  1.550983E+01

 

Program received signal SIGSEGV, Segmentation fault.

0x00002aaaabe94c6c in __GI___libc_free (mem=0x6c6ca0) at malloc.c:2945

2945    malloc.c: No such file or directory.

 

I also checked with the gfortran which is suggesting that there is some problem in deallocation of the allocated arrays "w" and "subdiag" in test_dstegr.f90 at 92 and 96 line, respectively. However, I don't see any problem at these place as I have checked that the arrays "w" and "subdiag" are allocated before the deallocations occur. So this problem seems ambiguous for me. I would appreciate very much if some one shed light on this issue. 

 

 

 

 

 

 

Fichier attachéTaille
Télécharger test_dstegr.f903.69 Ko
Télécharger lapack_dstegr.f904.36 Ko
14 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

I suggest that you try "-mkl" in place of "-llapack -lblas" first. I ran your program on Windows using IFort and MKL, and the program completed execution with no error messages.

Another suggestion is that when an ALLOCATE statement fails you print a message and terminate the program, instead of simply printing a message and calling a Lapack routine with an unallocated array argument.

Thanks mecej4 for your kind response. As I said, the same code and with the same compilation procedure do not give any error on Mac OS X platform too. I don't think -mkl will help and it is indeed the case as I have verified on LINUX. Also, I don't get any clear error message of the failure of allocation or deallocation. However, somewhat it is indicating a memory issue. I am waiting responses of other people who can check the code on LINUX, say, Ubuntu, CentOS, RED HAT, etc. I am getting problem in Ubuntu, CentOS where I have checked my code.  

It would be useful to know the particulars of the BLAS and Lapack libraries that are giving you trouble on Linux.

I think that you misunderstood my request.

Did you download prebuilt Lapack and BLAS libraries and runtime as a package from your Linux distribution repository, or from somewhere else? Did you build them from source yourself and, if so, using which compiler? Are you using a static library or a shared library?

@mecej4: Sorry I did not get your point. I am using system (or LINUX distribution) libraries and so should be shared. I'm not sure about it. On locate lapack, I am getting 

/usr/lib/lapack

/usr/lib/liblapack.a

/usr/lib/liblapack.so

/usr/lib/liblapack.so.3

/usr/lib/liblapack.so.3gf

/usr/lib/lapack/liblapack.a

/usr/lib/lapack/liblapack.so

/usr/lib/lapack/liblapack.so.3

/usr/lib/lapack/liblapack.so.3.0

 

/usr/lib/libblas

/usr/lib/libblas.a

/usr/lib/libblas.so

/usr/lib/libblas.so.3

/usr/lib/libblas.so.3gf

/usr/lib/libgslcblas.so.0

/usr/lib/libgslcblas.so.0.0.0

/usr/lib/libblas/libblas.a

/usr/lib/libblas/libblas.so

/usr/lib/libblas/libblas.so.3

/usr/lib/libblas/libblas.so.3.0

The point is that lapack from a linux distro depends on gfortran and will be incompatible with ifort run time library unless you rebuild from source using ifort. Mkl gives better results with less trouble.

If the question is one of Intel Fortran correctly working with a standard Linux library, it would be appropriate to this forum. However, the problem that you described occurs even when you use Gfortran in place of Ifort. Therefore, the problem is related to one of two possibilities: (i) An error in the rather complicated argument list to DSTEGR(), including allocation of the arrays in that list; (ii) An error in the library subroutine itself (an unlikely third possibility is bugs in Lapack routines).

Neither of these problems is interesting enough to induce me to investigate further. Furthermore, one of your two files implements a Fortran 95 interface to the Fortran 77 routine DSTEGR, which is a bit of "reinventing the wheel", since Lapack95 contains an even better interface (see the MKL manual page for ?STEGR).

If, however, all that you want is to run test calculations with a minimum of fuss, here is a replacement for your driver program, using facilities provided by MKL.  The main change is the call on line 52. To compile, use "ifort -mkl test_dstegr.f90 -lmkl_lapack95". This program runs through the entire range of k provided without any errors.

program test_lapack_dstegr
  use mkl95_lapack
  implicit none
! .. parameters ..
  integer, parameter :: dp = selected_real_kind (14)
! .. main variables ..
  integer :: dim
  real (kind=dp), dimension (:), allocatable :: diag
  real (kind=dp), dimension (:), allocatable :: subdiag
! .. lapack variables ..
  character (len=1) :: jobz  = 'v'
  character (len=1) :: range = 'a'
  integer :: m
  real (kind=dp), dimension (:), allocatable :: w
  real (kind=dp), dimension (:,:), allocatable :: z
  integer :: info
! .. other variables ..
  integer :: i, j, k, err, ios
  real (kind=dp) :: time,t_initial, t_final
  
  open(unit=11, file="time_dstegr.data", iostat=ios, status="replace")
  if ( ios /= 0 ) stop "error opening file time_dstegr.data"
  write (11,'(1x,2(a12,2x))') "dim", "time"

  do k = 100, 4000, 100    
    
!     .. dimension of matrix ..
      dim = k 
      print *, "Linear dimension of the matrix", dim

!     .. allocate arrays ..
      allocate(diag(dim), stat=err)
      if (err /= 0) print *, "diag: allocation request denied"
      allocate(subdiag(dim), stat=err)
      if (err /= 0) print *, "subdiag: allocation request denied"     

!     .. storing matrix elements of the upper triangular part ..
      do i = 1, dim, 1
        diag(i) = real(i*i, dp)
        if ( i < dim ) subdiag(i) = real(i, dp)
      end do
      
!     .. variables for lapack routine dstegr ..

      allocate(w(dim), stat=err)
      if (err /= 0) print *, "w: allocation request denied"
      
      allocate(z(dim,dim), stat=err)
      if (err /= 0) print *, "z: allocation request denied"

      call cpu_time(t_initial)
      call rstegr (diag,subdiag,w,z=z,m=m,abstol=1d-12,info=info)
      call cpu_time(t_final)

      time=t_final-t_initial
      write(*,*)'info = ',info
      if(info.ne.0)stop
      select case (info)
        case (0)
          print *, "Program has been run successfully"
          print *, "Number of eigenvalues: ", m
          print *, "Selected eigenvalues"
          print '(1x,4(es13.6,1x))', (w(i), i=1,4)
          if ( jobz == 'v' .and. dim <= 4) then
            print *, "Selected eigenvectors"
            print '(9x,4(i13,1x))', (i,i=1,m)
            do i = 1, dim, 1
              print '(1x,i4,4x,4(es13.6,1x))', i, (z(i,j),j=1,m)
            end do
          end if
        case (:-1)
          print '(1x,a,1x,i2,a)', "The", abs(info),"th argument in &
          &   dstegr routine had an illegal value"
        case default
          print *, "internal error"
      end select
        
      if (allocated(z)) deallocate(z, stat=err)
      if (err /= 0) print *, "z: deallocation request denied"
      if (allocated(w)) deallocate(w, stat=err)
      if (err /= 0) print *, "w: deallocation request denied"
      if (allocated(diag)) deallocate(diag, stat=err)
      if (err /= 0) print *, "diag: deallocation request denied"
      if (allocated(subdiag)) deallocate(subdiag, stat=err)
      if (err /= 0) print *, "subdiag: deallocation request denied"
  
!     .. timing benchmark ..  
      write (11, '(1x, i13, 1x, es13.6)') dim, time
  end do
  
  close(unit=11, iostat=ios)
  if ( ios /= 0 ) stop "error closing file unit 11"

end program test_lapack_dstegr

 

The problem is dstegr routine runs successfully and gives info=0, and program stuck when deallocation of w and subdiag arrays occur. Strangely this happens only on Linux platform not on Mac OS X and Windows. Just for curiosity, I would like to know what is wrong with my code which makes it incompatible with Linux.

You have some incorrect logic in your allocation/deallocation routines. Here is an example:

    if (allocated(work)) deallocate(work, stat=err)
    if (err /= 0) print *, "work: deallocation request denied"

The problem is that if 'work' is not presently allocated, 'err' will be undefined after the first of these two lines has been executed. This undefined variable will be tested in the next line, giving you misleading messages (or not printing when it should).

Of course, this error is a side issue not related to your main complaint.

I found that by updating the Lapack library to 3.5.0, the problems regarding access violation, heap corruption, etc., went away. Have you updated the Lapack and BLAS libraries on your Linux system recently?

I compiled lapack and blas libraries from source lapack-3.5.0 with ifort on Linux system, and then used these libs in my code. Unfortunately, I got the same result as before. I mean gdb gives

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

 Linear dimension of the matrix         100

 Program has been run successfully

 Number of eigenvalues:          100

 Selected eigenvalues

  6.473654E-01  3.540588E+00  8.517578E+00  1.550983E+01

 

Program received signal SIGSEGV, Segmentation fault.

0x00007ffff736fc6c in __GI___libc_free (mem=0x6edd80) at malloc.c:2945

2945    malloc.c: No such file or directory.

F.R.: I believe that your program exposes a bug in the standard library routine free(). I tracked down the problem by using an assembly level debugger, and have reported the bug at https://bugzilla.novell.com/show_bug.cgi?id=891349 . However, I am not confident about my knowledge in matters related to the GCC runtime, and there is an obstacle to overcome. Bugzillas usually ask for "steps to reproduce" an error, and I don't know how to put together a program that generates a valid address which also has at least one of bits 26-63 set.

Followup: I was able to create a pared down Fortran program from your code to exhibit the bug. I have posted it in a new post at https://software.intel.com/en-us/forums/topic/520237 . Thanks for your patience and cooperation in making this happen.

Thanks mecej4. I look forward to see the final conclusion.

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui