segmentation fault only with mkl on intel cluster, MPI parallelization

segmentation fault only with mkl on intel cluster, MPI parallelization

Hi,

I am trying to run a mpi-application on a linux cluster Intel Xeon; it has been compiled with the intel fortran and c compilers(version 10.1) and mkl collection(version 10.0).

I don't get any error at compile time but I do get the following run-time error:

------------------------------------------------------------------------------------------
Parallel environment (un)loaded (OpenMPI+Intel)

[n017:03307] *** Process received signal ***
[n017:03307] Signal: Segmentation fault (11)
[n017:03307] Signal code: Address not mapped (1)
[n017:03307] Failing at address: 0x1
[n017:03307] [ 0] /lib64/libpthread.so.0 [0x2b2d247fc7c0]
[n017:03307] [ 1] /opt/intel/mkl/10.0.5.025/lib/em64t/libmkl_lapack.so(mkl_lapack_dlarre+0xc3) [0x2aaab40d0247]
[n017:03307] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3307 on node n017 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

This error doesn't show up immediately, but after some computations have been successfully(?) performed.

It looks strange that in the error report there is a reference to libmkl_lapack.so, because I link just with
-lmkl_intel_lp64 -lmkl_sequential -lmkl_core
in fact I don't need lapack in my application, but only blas. I am linking to mkl for em64t architecture, and I am using sequential mkl because the application is parallelized with mpi and I don't want the threaded mkl to interfer with it. I have also tried to link to
-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
with variable OMP_NUM_THREADS=1 in the makefile, but it doesn't work either.

The full linking line I am using in my application is

-lgsl -lgslcblas -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lguide -lpthread
plus some user defined libraries.

By the way the same application compiled with intel compilers and the acml(non-threaded) library works on the same cluster without giving a segmentation fault.

I don't understand what I could have done wrong. Does anybody have any hint? Any help is greatly appreciated.

6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hello,

I'm moving this to the Intel Clusters and HPC Technology forum so that the Intel Software Development Products team can help.

==
Aubrey W.
Intel Software Network Support

Best Reply

Hi Afylot,
Are you running mpi based application with MKL, but I cannot see any *.libraries which are supported MPI?if you don't need MPI -based functionality from MKL ( no CFFT or Scalapack) then, link-lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lguide -lpthread.if the probelm will be there - may you update the MKL? The version you uses is very aged, and many many problems were fixed since this version ( 10.1, 10.2 and the latest 10.3 beta release 1 month ago).--Gennady

Also, I've found useful the following link, it shows you how the linking should be done under different scenarios. I would give it a try.http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/-- Andres

I see this is an MKL issue, so I've moved it again.

==
Aubrey W.
Intel Software Network Support

I changed to mkl v10.2 and compiler v10.1, I don't have such problem anymore.

@Andreas
It seems very useful, I think I am going to use it a lot.

Thanks

Leave a Comment

Please sign in to add a comment. Not a member? Join today