Dual Rail Performance

Dual Rail Performance

Hello,I have a dual rail FDR system with SandyBridge nodes. I have 16 processes assigned to each of four nodes and I am testing the all-to-all. I have enabled dual rail with export I_MPI_OFA_NUM_ADAPTERS=2 export I_MPI_OFA_RAIL_SCHEDULER=ROUND_ROBINThe ibstat showsCA 'mlx4_0' CA type: MT4099 Number of ports: 1 Firmware version: 2.10.2370 Hardware version: 0 Node GUID: 0x001e6703003dd888 System image GUID: 0x001e6703003dd88b Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 78 LMC: 0 SM lid: 3 Capability mask: 0x02514868 Port GUID: 0x001e6703003dd889 Link layer: InfiniBandCA 'mlx4_1' CA type: MT4099 Number of ports: 1 Firmware version: 2.10.700 Hardware version: 0 Node GUID: 0x0002c90300333e90 System image GUID: 0x0002c90300333e93 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 779 LMC: 0 SM lid: 618 Capability mask: 0x02514868 Port GUID: 0x0002c90300333e91 Link layer: InfiniBandWhere does Intel MPI pick up the name of the adapters to use for dual rail mode?ThanksDavid Race

5 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

Hi David,
Intel MPI calls an IB verbs function to get a list of available IB devices. So, Intel MPI doesn't work with names of the adapters and doesn't try to read any configuration file. It should be done bylibibvers library.

Regards!
Dmitry

I set I_MPI_DEBUG=10, then received[16] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[16] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[28] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[28] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[32] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[32] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[35] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[35] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[36] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[36] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[37] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[37] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[38] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[40] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[40] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[41] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[41] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[42] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[42] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[43] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[43] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[48] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[48] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[16] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[16] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[28] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[28] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[32] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[32] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[35] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[35] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[36] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[36] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[37] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[37] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[38] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[40] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[40] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[41] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[41] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[42] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[42] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[43] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[43] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[48] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[48] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.From this an the Intel MPI Documentation, does this mean I need an entry for ofa-v2-mlx4_1-1 in the /etc/dat.conf file?I only have an entry for the first IB devices in the /etc/dat.conf file.Is this correct?ThanksDavid

Hi David,

afaik, you cannot use dapl for multirail. You'll have to use the shm:ofa fabric instead.

cheers.

By default Intel MPI uses shm:dapl fabric.
To enable OFA fabric you need to either set environment variable I_MPI_FABRICS=shm:ofa or add an option '-genv I_MPI_FABRICS shm:ofa' to your mpirun command.

Multi-rail feature is available with ofa fabric only and only with versions 4.0.x

Regards!
Dmitry

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui