I'm implementing techniques for fault tolerance using the Intel MPI.
I have the following scenario: two hosts (A and B) communicate via MPI messages.
Host B has a failure (crash or loss of link, for example). Host A that was trying to send a message to host B, can not complete the deployment,. There is the closure of the application about 15 minutes later, because the generation of a INTERNAL_ERRO. This occurs because of the failure of the various attempts to send TCP defined (these attempts are defined tcp_retry2).
The same procedure performed in OpenMPI does not have the same fate, ie the application is not interrupted.
Is there any way to disable the issuance of this bug in Intel MPI?
More clearly, disable the generation of INTERNAL_ERRO due to not completing the post even after several attempts the TCP layer defined tcp_retry?