I recently made several changes to a small workstation running four MIC Phi cards. My use is large for MPI, so I need all cards to be able to communicate with one another. This was working fine before, but now when I try to execute test code, I run into the following problem.
Simple examine, just trying to run MPI test code on only on MIC card:
mpirun -n 60 -hosts mic0 ./testMPI+openMP
which generates the following error:
[proxy:0:0@Axial-mic0.localdomain] HYDU_sock_connect (./utils/sock/sock.c:264): unable to connect from "Axial-mic0.localdomain" to "10.50.6.239" (Network is unreachable)
[proxy:0:0@Axial-mic0.localdomain] main (./pm/pmiserv/pmip.c:396): unable to connect to server 10.50.6.239 at port 50415 (check for firewalls!)
The internal bridge (br0) is on 10.10.10.x and the host is on 10.10.10.254. I can ssh to the cards from the host, and from the cards back to 10.10.10.254, but the cards cannot connect to the eth0 ip of the host (10.50.6.239). What I don't understand is why mic0 is trying to connect to 10.50.6.239 instead of 10.10.10.254.