mkl scalapack + dapl fails

mkl scalapack + dapl fails



I try to run mkl scalapack in fortran code on an infinity band network using I_MPI_FABRICS=shm:dapl, however mkl scalapack does not work correctly when running on several nodes, e.g.  pzheev exits with error code 16. When switching to shm:tcp network fabrics it works. It also works with netlib scalapack reference implementation + mkl lapack/blas. I tried intel-2016 update 4 and intel-2017 update 4, both give the same errors. 

Any idea on this error ?




7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

this is an unknown problem with MKL 2017 u3. If pzheev works with Netlib's implementation and fails with MKL, then probably, this is the bug. Could you give us reproducer?

thanks for the swift reply.

I did some more testing.I used to use -mkl=cluster for which I got the errors mentioned above, however when using the linker options supplied by intel mkl link advisor, i.e. " ${MKLROOT}/lib/intel64/libmkl_scalapack_lp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group -lpthread -lm -ldl",  it works.  Using -mkl=cluster together with TCP fabrics works, tough.

So is it in general not a good idea to use "-mkl=cluster" over giving the linker options explicitly?

This is strange because of when we use -mkl=cluster compiler option, that's mean we link with sequential MKL cluster components and these components are used Intel MPI. Here is should be identical what you explicitly used. 

The difference is that -mkl=cluster links the libraries are linked dynamically while following link advisor links them statically. If I use -mkl=cluster -static-intel -static_mpi to statically link the mkl libraries scalapack works.

then, could you give us the reproducer?

I will try to provide it.

Leave a Comment

Please sign in to add a comment. Not a member? Join today