Suspending the issuance of INTERNAL_ERRO when sending a message MPI is not completed

Suspending the issuance of INTERNAL_ERRO when sending a message MPI is not completed

Hi,

I'm implementing techniques for fault tolerance using the Intel MPI.

I have the following scenario: two hosts (A and B) communicate via MPI messages.
Host B has a failure (crash or loss of link, for example). Host A that was trying to send a message to host B, can not complete the deployment,. There is the closure of the application about 15 minutes later, because the generation of a INTERNAL_ERRO. This occurs because of the failure of the various attempts to send TCP defined (these attempts are defined tcp_retry2).

The same procedure performed in OpenMPI does not have the same fate, ie the application is not interrupted.

Is there any way to disable the issuance of this bug in Intel MPI?
More clearly, disable the generation of INTERNAL_ERRO due to not completing the post even after several attempts the TCP layer defined tcp_retry?

Thank's
Alexandre D.Gonalves

2 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

Hi Alexandre,

You can try to set I_MPI_FAULT_CONTINUE=on:
$ mpiexec -env I_MPI_FAULT_CONTINUE on -n 2 ./test

Regards!
Dmitry

Deixar um comentário

Faça login para adicionar um comentário. Não é membro? Inscreva-se hoje mesmo!