mpirun error "APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)"

mpirun error "APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)"

Hello, i am using intel mpirun (version = for Linux* OS, Version 4.0 Update 3 Build 20110824) to run a program that i have compiled in our cluster. We use PBS queue system (version = PBSPro_11.1.0.111761).

When I use

$ mpirun -n 8 -machinefile $PBS_NODEFILE -verbose /home/a.c.padilha/bin/vasp.teste.O0.debug.x 

I end up getting these error messages:

[proxy:0:1@n022] got crush from 5, 0
[proxy:0:2@n023] got crush from 5, 0
[proxy:0:2@n023] got crush from 4, 0
[proxy:0:0@n009] got crush from 6, 0
[proxy:0:0@n009] got crush from 9, 0
[proxy:0:0@n009] got crush from 17, 0
[proxy:0:1@n022] got crush from 4, 0
[proxy:0:0@n009] got crush from 10, 0
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)

I have tryed calling mpirun with -check_mpi and -env I_MPI_DEBUG=5 but so far i have no clue of what is going on. This happens only when i use more than one computing node.

Any help would be very nice.

Claudio Padilha
publicaciones de 5 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Hi Claudio,

Could you please provide the full output of your MPI- run with the “-genv I_MPI_HYDRA_DEBUG=1” environment. Also, please provide us the output of “cat $PBS_NODEFILE” - after resource allocation.

Regards,

Michael

Hi,

I'm also experiencing the same error, but in my case it happens with only 1 node (I didn't try multiple nodes execution).

I use the following MPI version.

$ mpirun -V

Intel(R) MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824

Copyright (C) 2003-2011, Intel Corporation. All rights reserved.

I don't use queuing system, i.e. I execute my job from command line with the following:

$ mpirun -verbose -check-mpi -genv I_MPI_DEBUG 5 -genv I_MPI_HYDRA_DEBUG 1 -np 40 ~/bin/vasp5O0g > out 2>&1 &

Then the job ended with

[proxy:0:0@ebn13] got crush from 35, 0

[proxy:0:0@ebn13] got crush from 26, 0

snip

[proxy:0:0@ebn13] got crush from 41, 0

APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)

The executable is compiled with mpiifort associated with ifort version 12.1.2.273 Build 20111128 and is statically linked to MKL library.

The file including standard error/output is attached. If you need more information, please let me know.

Any kind of advice would be appreciated. Thank you.

Sincerely,

MM

Adjuntos: 

AdjuntoTamaño
Descargar out.6e.gz6.8 KB

Dear Claudio,

I also had problems when trying to use more than one computing node with Intel MPI. These are my previous posts in case you can find some useful information:

http://software.intel.com/en-us/forums/topic/329053

http://software.intel.com/en-us/forums/topic/370967

Regards,

Ivan

Iván Santos Tejido Dpto. Electricidad y Electrónica Universidad de Valladolid, Spain

Hi Michael,

 The output using

$ mpirun -n 16 -machinefile $PBS_NODEFILE -verbose -genv I_MPI_HYDRA_DEBUG=1 -check_mpi /home/a.c.padilha/bin/vasp.teste.O0.debug.x > log

 is in the file log.txt. Even if i redirect my output to a file i got this message 

ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.

for each of the MPI processes. I looked for this libVTmc.so and found that it is a debugging library so i believe it is not related to the original problem in any manner.

Thanks for your reply Iván, but I could not get the same error message you got in your posts, even though I used exactly the same flags in the mpirun call.

Regards,

Claudio

Adjuntos: 

AdjuntoTamaño
Descargar log.txt94.03 KB
Claudio Padilha

Inicie sesión para dejar un comentario.