Running MPI over heterogeneuos Infiniband Nework

Running MPI over heterogeneuos Infiniband Nework

Hello,

I have a setup of Infiniband network where we are testing the performance of FDRs. I have two FDRs on the sender and one FDR on each receiver. There are two receivers. The idea is to run parallel sends from the sender on each FDR and receive it at receiver. We were trying mvapich, but mvapich stated that they clearly don't support such a network.

I was wondering if Intel MPI support such a network. And if we can do something like:

mpirun -n 2 -hosts Sender,Receiver1 -env MV2_IBA_HCA=mlx4_0 ./exec : -n 2 -hosts Sender,Receiver2 -env MV2_IBA_HCA=mlx4_1 ./exec

where mlx4_0 and mlx4_1 are the ids of the FDR cards. So I am trying to run ./exec parallel on different FDR cards to send data to both the receivers at the same time.

Is this possible using Intel MPI. If someone has a similar setup then please let me know.

 

Thanks,

Santak

5 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

Hi Santak,

This is not a supported method.  The best suggestion I have is to try using OFED* multiple adapter capability.  To do this, set

I_MPI_FABRICS=shm:ofa

on all of the ranks.  On the ranks on Sender, set

I_MPI_OFA_NUM_ADAPTERS=2

and on the ranks on receiver, set

I_MPI_OFA_NUM_ADAPTERS=1

I don't know if this will work, and I don't have a system to test it on.  I'm asking our developers for any additional information they may have.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Hi Santak,

Ok, I stand corrected, this is a supported model.  What you'll need to do is set

I_MPI_CHECK_DAPL_PROVIDER_MISMATCH=0

And run

mpirun -n 2 -hosts Sender,Receiver1 -env I_MPI_DAPL_PROVIDER=mlx4_0 ./exec : -n 2 -hosts Sender,Receiver2 -env I_MPI_DAPL_PROVIDER=mlx4_1 ./exec

James.

Thanks James for your updates. I was also trying few combinations and I figured out that this command:

mpirun -genv I_MPI_DEBUG 5 -genv I_MPI_FABRICS shm:ofa -n 1 -host Sender -env I_MPI_OFA_ADAPTER_NAME mlx4_0 ./exec : -n 1 -host Sender -env I_MPI_OFA_ADAPTER_NAME mlx4_1 ./exec : -n 1 -host Receiver1 -env I_MPI_OFA_ADAPTER_NAME mlx4_0 ./exec : -n 1 -host -env I_MPI_OFA_ADAPTER_NAME mlx4_0 Receiver2 ./exec

And as you have mentioned in the comment, Intel-MPI doesn't behave exactly as mvapich. So "-n 2 -hosts Sender,Receiver1" starts 2 processes on Sender and I see no executable running on Receiver1. If you notice above command, I have one line for each node. In this way I was able to run it.

 

The easier way to get that behavior is to use -ppn 1.  This puts one process per node.

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui