Dual Rail Performance

Dual Rail Performance

David Race's picture

Hello, I have a dual rail FDR system with SandyBridge nodes. I have 16 processes assigned to each of four nodes and I am testing the all-to-all. I have enabled dual rail with export I_MPI_OFA_NUM_ADAPTERS=2 export I_MPI_OFA_RAIL_SCHEDULER=ROUND_ROBIN The ibstat shows CA 'mlx4_0' CA type: MT4099 Number of ports: 1 Firmware version: 2.10.2370 Hardware version: 0 Node GUID: 0x001e6703003dd888 System image GUID: 0x001e6703003dd88b Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 78 LMC: 0 SM lid: 3 Capability mask: 0x02514868 Port GUID: 0x001e6703003dd889 Link layer: InfiniBand CA 'mlx4_1' CA type: MT4099 Number of ports: 1 Firmware version: 2.10.700 Hardware version: 0 Node GUID: 0x0002c90300333e90 System image GUID: 0x0002c90300333e93 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 779 LMC: 0 SM lid: 618 Capability mask: 0x02514868 Port GUID: 0x0002c90300333e91 Link layer: InfiniBand Where does Intel MPI pick up the name of the adapters to use for dual rail mode? Thanks David Race

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Dmitry Kuzmin (Intel)'s picture

Hi David,
Intel MPI calls an IB verbs function to get a list of available IB devices. So, Intel MPI doesn't work with names of the adapters and doesn't try to read any configuration file. It should be done bylibibvers library.

Regards!
Dmitry

David Race's picture

I set I_MPI_DEBUG=10, then received [16] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1 [16] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2 [28] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1 [28] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2 [32] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1 [32] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2 [35] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1 [35] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2 [36] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1 [36] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2 [37] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1 [37] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2 [38] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1 [40] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1 [40] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2 [41] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1 [41] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2 [42] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1 [42] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2 [43] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1 [43] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2 [48] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1 [48] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2

[16] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[16] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[28] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[28] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[32] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[32] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[35] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[35] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[36] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[36] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[37] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[37] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[38] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[40] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[40] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[41] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[41] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[42] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[42] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[43] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[43] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2[48] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1[48] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so. From this an the Intel MPI Documentation, does this mean I need an entry for ofa-v2-mlx4_1-1 in the /etc/dat.conf file? I only have an entry for the first IB devices in the /etc/dat.conf file. Is this correct? Thanks David

karl_lehnberger's picture

Hi David,

afaik, you cannot use dapl for multirail. You'll have to use the shm:ofa fabric instead.

cheers.

Dmitry Kuzmin (Intel)'s picture

By default Intel MPI uses shm:dapl fabric.
To enable OFA fabric you need to either set environment variable I_MPI_FABRICS=shm:ofa or add an option '-genv I_MPI_FABRICS shm:ofa' to your mpirun command.

Multi-rail feature is available with ofa fabric only and only with versions 4.0.x

Regards!
Dmitry

Login to leave a comment.