mpirun error "APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)"

mpirun error "APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)"

Imagen de Antonio Claudio P.

Hello, i am using intel mpirun (version = for Linux* OS, Version 4.0 Update 3 Build 20110824) to run a program that i have compiled in our cluster. We use PBS queue system (version = PBSPro_11.1.0.111761).

When I use

$ mpirun -n 8 -machinefile $PBS_NODEFILE -verbose /home/a.c.padilha/bin/vasp.teste.O0.debug.x 

I end up getting these error messages:

[proxy:0:1@n022] got crush from 5, 0
[proxy:0:2@n023] got crush from 5, 0
[proxy:0:2@n023] got crush from 4, 0
[proxy:0:0@n009] got crush from 6, 0
[proxy:0:0@n009] got crush from 9, 0
[proxy:0:0@n009] got crush from 17, 0
[proxy:0:1@n022] got crush from 4, 0
[proxy:0:0@n009] got crush from 10, 0
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)

I have tryed calling mpirun with -check_mpi and -env I_MPI_DEBUG=5 but so far i have no clue of what is going on. This happens only when i use more than one computing node.

Any help would be very nice.

Claudio Padilha
publicaciones de 5 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de Michael Steyer (Intel)

Hi Claudio,

Could you please provide the full output of your MPI- run with the “-genv I_MPI_HYDRA_DEBUG=1” environment. Also, please provide us the output of “cat $PBS_NODEFILE” - after resource allocation.

Regards,

Michael

Imagen de mat

Hi,

I'm also experiencing the same error, but in my case it happens with only 1 node (I didn't try multiple nodes execution).

I use the following MPI version.

$ mpirun -V

Intel(R) MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824

Copyright (C) 2003-2011, Intel Corporation. All rights reserved.

I don't use queuing system, i.e. I execute my job from command line with the following:

$ mpirun -verbose -check-mpi -genv I_MPI_DEBUG 5 -genv I_MPI_HYDRA_DEBUG 1 -np 40 ~/bin/vasp5O0g > out 2>&1 &

Then the job ended with

[proxy:0:0@ebn13] got crush from 35, 0

[proxy:0:0@ebn13] got crush from 26, 0

snip

[proxy:0:0@ebn13] got crush from 41, 0

APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)

The executable is compiled with mpiifort associated with ifort version 12.1.2.273 Build 20111128 and is statically linked to MKL library.

The file including standard error/output is attached. If you need more information, please let me know.

Any kind of advice would be appreciated. Thank you.

Sincerely,

MM

Adjuntos: 

AdjuntoTamaño
Descargar out.6e.gz6.8 KB
Imagen de Iván S.

Dear Claudio,

I also had problems when trying to use more than one computing node with Intel MPI. These are my previous posts in case you can find some useful information:

http://software.intel.com/en-us/forums/topic/329053

http://software.intel.com/en-us/forums/topic/370967

Regards,

Ivan

Iván Santos Tejido Dpto. Electricidad y Electrónica Universidad de Valladolid, Spain
Imagen de Antonio Claudio P.

Hi Michael,

 The output using

$ mpirun -n 16 -machinefile $PBS_NODEFILE -verbose -genv I_MPI_HYDRA_DEBUG=1 -check_mpi /home/a.c.padilha/bin/vasp.teste.O0.debug.x > log

 is in the file log.txt. Even if i redirect my output to a file i got this message 

ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.

for each of the MPI processes. I looked for this libVTmc.so and found that it is a debugging library so i believe it is not related to the original problem in any manner.

Thanks for your reply Iván, but I could not get the same error message you got in your posts, even though I used exactly the same flags in the mpirun call.

Regards,

Claudio

Adjuntos: 

AdjuntoTamaño
Descargar log.txt94.03 KB
Claudio Padilha

Inicie sesión para dejar un comentario.