Dear all,
I am compiling different codes (details at the end) using the Intel Cluster Studio 2013 for Linux (C and Fortran compilers, MKL BLACS and MKL FFT3W) + Intel MPI 4.0.3.008. The programs run without problems when using one computing node, but they crash when I try to use more than one computing node.
I have gathered all the possible information from the execution and MPI calls with these options of mpirun: -v -check_mpi -genv I_MPI_DEBUG 5. The resulting information is in the attached files.
The interesting information is at the end of the files, where you can find:
from vasp.log:
[23] ERROR: LOCAL:EXIT:SIGNAL: fatal error
[23] ERROR: Fatal signal 11 (SIGSEGV) raised.
[23] ERROR: Signal was encountered at:
[23] ERROR: hamil_mp_hamiltmu_ (/home/ivasan/programas/VASP/vasp.5.3_test/vasp)
[23] ERROR: After leaving:
[23] ERROR: mpi_allreduce_(*sendbuf=0x7fff5d1ce340, *recvbuf=0x18e19c0, count=1, datatype=MPI_DOUBLE_PRECISION, op=MPI_SUM, comm=0xffffffffc4060000 CART_SUB CART_CREATE CART_SUB CART_CREATE COMM_WORLD [18:23], *ierr=0x7fff5d1ce2ac->MPI_SUCCESS)
from abinit.log:
[23] ERROR: LOCAL:MPI:CALL_FAILED: error
[23] ERROR: Null communicator.
[23] ERROR: Error occurred at:
[23] ERROR: mpi_comm_rank_(comm=MPI_COMM_NULL, *rank=0x29319b8, *ierr=0x7fff83fabb74)
[23] ERROR: initmpi_grid_ (/home/ivasan/programas/abinit/abinit-6.12.3b/src/51_manage_mpi/initmpi_grid.F90:178)
[23] ERROR: invars1_ (/home/ivasan/programas/abinit/abinit-6.12.3b/src/57_iovars/invars1.F90:1015)
[23] ERROR: invars1m_ (/home/ivasan/programas/abinit/abinit-6.12.3b/src/57_iovars/invars1m.F90:186)
[23] ERROR: m_ab6_invars_mp_ab6_invars_load_ (/home/ivasan/programas/abinit/abinit-6.12.3b/src/57_iovars/m_ab6_invars_f90.F90:548)
[23] ERROR: MAIN__ (/home/ivasan/programas/abinit/abinit-6.12.3b/src/98_main/abinit.F90:260)
[23] ERROR: main (/home/ivasan/programas/abinit/abinit-6.12.3b/bin/abinit)
[23] ERROR: (/lib64/libc-2.5.so)
[23] ERROR: (/home/ivasan/programas/abinit/abinit-6.12.3b/bin/abinit)
So in both cases the problems seem to be related to MPI.
What can I do to solve these errors?
Thanks in advance for your help.
Iván
CODES:
- VASP V5.3.2 (http://www.vasp.at/). I posted this issue at the support forum: http://cms.mpi.univie.ac.at/vasp-forum/forum_viewtopic.php?3.12037
- Abinit V6.12.3 (http://www.abinit.org/). I posted this issue at the support forum: http://forum.abinit.org/viewtopic.php?f=3&t=1851



