issue with MPI communication with two MIC cards and xeon processor

issue with MPI communication with two MIC cards and xeon processor

Hello,

I am running a MPI application (involving 5 ranks) which runs smoothly when all ranks are on Xeon processor but when i put two ranks on MIC0 and MIC1 there is following issue and the program just hangs and gives me segmentation fault.

setup:-

using (blocking MPI send and non blocking MPI recv)

rank0, rank1 on MIC0,MIC1

rank2,rank3,rank4 on xeon

issue:-

rank1-->sends 100 packets and reaches finalize() 

rank2-->only receives 60 packets and then hangs

some things i tried:-

I added a sleep(1) before rank1 sends packets and this solved the issue as rank2 could get all the packets

but for large number of packets (>100) adding sleep doesnt solve the issue and the system hangs

any suggestions

thank you  

 

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Vikrant,

I assume you already configured peer-to-peer before running the program:

# sudo /sbin/sysctl -w net.ipv4.ip_forward=1

Also, you may want to try to run rank 0 on host and rank 1, 2 on mic0, mic1 respectively to see if the problem still occurs. What version of compiler and Intel MPI libraries you have?

hi loc-nguyen

the version i am using is 4.1.1.036 for mpiicc

I did try interchanging the ranks but face the same issue, 

No, i had not done the command "sudo /sbin/sysctl -w net.ipv4.ip_forward=1",but i could do ssh between the two cards and run simple programs with the same hybrid structure , so thought i had that part covered, 

but even after i implemented the command on host, I get the same error

 

thanks

 

Hi Vikrant,

Is it possible that you post the source code so I can take a close look at the issue? Thank you.

Hi loc-nguyen  

sorry cannot post the code here, I am currently in Intel Santa Clara and if possible can I meet you if you are in santa clara

thanks

Leave a Comment

Please sign in to add a comment. Not a member? Join today