"Hello world" slow with DAPL over IB

"Hello world" slow with DAPL over IB

Portrait de tamstorf

I'm trying to get Intel MPI v. 4.1.1.036 up and running over infiniband. Unfortunately, I'm getting odd slowdowns or in most cases complete hangs when using DAPL. I'm fairly new to all of this so it is most likely a configuration error, but I'm not sure where it is. Nor how to find it, so I'm hoping someone here can help.

To test things I'm using the MPI hello world example from http://mpitutorial.com/mpi-hello-world. I can run this over the ethernet interface without any problems :

drfe105:mpi_hello_world[327]% date ; mpirun -n 2 -rr -machinefile ./twohosts_eth -genv I_MPI_FABRICS shm:tcp ./mpi_hello_world; date
Tue Aug 20 20:16:37 PDT 2013
[0] MPI startup(): shm and tcp data transfer modes
[1] MPI startup(): shm and tcp data transfer modes
Hello world from processor drfe105, rank 0 out of 2 processors
Hello world from processor drfe106, rank 1 out of 2 processors
Tue Aug 20 20:16:37 PDT 2013

However, if I run the same test over the infiniband interface things take a lot longer :

drfe105:mpi_hello_world[326]% date; mpirun -n 2 -rr -machinefile ./twohosts_ib -genv I_MPI_FABRICS dapl:dapl ./mpi_hello_world; date
Tue Aug 20 20:16:12 PDT 2013
[0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-ib0
[0] MPI startup(): DAPL provider ofa-v2-ib0
[0] MPI startup(): dapl data transfer mode
[1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-ib0
[1] MPI startup(): DAPL provider ofa-v2-ib0
[1] MPI startup(): dapl data transfer mode
Hello world from processor drfe105, rank 0 out of 2 processors
Hello world from processor drfe106, rank 1 out of 2 processors
Tue Aug 20 20:16:21 PDT 2013

To be more specific the ethernet run takes less than a second, while the infiniband run takes 9 seconds (!). To get more information I've run it with a higher debug level, and I see that most of the time is being spent after the following output :

drfe105:mpi_hello_world[329]% date ; mpirun -n 2 -rr -genv I_MPI_DEBUG 50 -machinefile ./twohosts_ib -genv I_MPI_FABRICS dapl ./mpi_hello_world ; date
Tue Aug 20 20:19:34 PDT 2013
[0] MPI startup(): Intel(R) MPI Library, Version 4.1 Update 1  Build 20130522
[0] MPI startup(): Copyright (C) 2003-2013 Intel Corporation.  All rights reserved.
[0] my_dlopen(): trying to dlopen: libdat2.so.2
[1] my_dlopen(): trying to dlopen: libdat2.so.2
[0] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[0] my_dlopen(): trying to dlopen: libdat2.so.2
[1] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[1] my_dlopen(): trying to dlopen: libdat2.so.2
[1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-ib0
[1] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[1] my_dlopen(): trying to dlopen: libdat2.so.2
[1] MPI startup(): DAPL provider ofa-v2-ib0
[1] RTC():  setup malloc hooks
[1] MPI startup(): dapl data transfer mode
[0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-ib0
[0] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[0] my_dlopen(): trying to dlopen: libdat2.so.2
[0] MPI startup(): DAPL provider ofa-v2-ib0
[0] RTC():  setup malloc hooks
[0] MPI startup(): dapl data transfer mode
[0] MPI startup(): static connections storm algo

Eventually it picks up and finishes the simple hello world example, but for a more complicated application it just seems to hang forever. I have tested the DAPL layer with 'dapltest' (from the dapl-utils 2.0.34 rpm package) and all those tests succeed. I'm including some additional information about my setup below. If any other information would be helpful, then please let me know.

Rasmus

[root@drfe105 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Workstation release 6.4 (Santiago)

drfe105:mpi_hello_world[333]% /usr/sbin/ibstat
CA 'qib0'
    CA type: InfiniPath_QLE7342
    Number of ports: 2
    Firmware version:
    Hardware version: 2
    Node GUID: 0x001175000077d952
    System image GUID: 0x001175000077d952
    Port 1:
        State: Active
        Physical state: LinkUp
        Rate: 20
        Base lid: 6
        LMC: 0
        SM lid: 3
        Capability mask: 0x0761086a
        Port GUID: 0x001175000077d952
        Link layer: InfiniBand
    Port 2:
        State: Down
        Physical state: Disabled
        Rate: 10
        Base lid: 65535
        LMC: 0
        SM lid: 65535
        Capability mask: 0x07610868
        Port GUID: 0x001175000077d953
        Link layer: InfiniBand

[root@drfe105 ~]# ibstatus
Infiniband device 'qib0' port 1 status:
    default gid:     fe80:0000:0000:0000:0011:7500:0077:d952
    base lid:     0x6
    sm lid:         0x3
    state:         4: ACTIVE
    phys state:     5: LinkUp
    rate:         20 Gb/sec (4X DDR)
    link_layer:     InfiniBand

Infiniband device 'qib0' port 2 status:
    default gid:     fe80:0000:0000:0000:0011:7500:0077:d953
    base lid:     0xffff
    sm lid:         0xffff
    state:         1: DOWN
    phys state:     3: Disabled
    rate:         10 Gb/sec (4X)
    link_layer:     InfiniBand

drfe105:mpi_hello_world[334]% mpirun -V
Intel(R) MPI Library for Linux* OS, Version 4.1 Update 1 Build 20130522
Copyright (C) 2003-2013, Intel Corporation. All rights reserved.

drfe105:mpi_hello_world[335]% cat /etc/rdma/dat.conf
ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""
ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" ""
ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 1" ""
ofa-v2-mthca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 2" ""
ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 1" ""
ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 2" ""
ofa-v2-ehca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ehca0 1" ""
ofa-v2-iwarp u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""
ofa-v2-mlx4_0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-mlx4_0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-mthca0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mthca0 1" ""
ofa-v2-mthca0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mthca0 2" ""
ofa-v2-cma-roe-eth2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""
ofa-v2-cma-roe-eth3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth3 0" ""
ofa-v2-scm-roe-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-scm-roe-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""

[root@drfe105 ~]# rpm -q libibcm libibverbs libibverbs-utils librdmacm librdmacm-utils rdma
libibcm-1.0.5-3.el6.x86_64
libibverbs-1.1.6-5.el6.x86_64
libibverbs-utils-1.1.6-5.el6.x86_64
librdmacm-1.0.17-0.git4b5c1aa.el6.x86_64
librdmacm-utils-1.0.17-0.git4b5c1aa.el6.x86_64
rdma-3.6-1.el6.noarch
[root@drfe105 ~]# rpm -q dapl ibacm ibsim ibutils libcxgb3 libibmad libibumad libipathverbs libmlx4 libmthca libnes rds-tools
dapl-2.0.34-1.el6.x86_64
ibacm-1.0.8-0.git7a3adb7.el6.x86_64
ibsim-0.5-7.el6.x86_64
ibutils-1.5.7-7.el6.x86_64
libcxgb3-1.3.1-1.el6.x86_64
libibmad-1.3.9-1.el6.x86_64
libibumad-1.3.8-1.el6.x86_64
libipathverbs-1.2-4.el6.x86_64
libmlx4-1.0.4-1.el6.x86_64
libmthca-1.0.6-3.el6.x86_64
libnes-1.1.3-1.el6.x86_64
rds-tools-2.0.6-3.el6.x86_64

2 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.
Portrait de James Tullos (Intel)

Why are you using different host files (machine file in your case) for sockets vs. InfiniBand*?  You should use the same host file for either fabric.  The ranks are initially launched over sockets using SSH.  Once the ranks are started, they will connect over the selected fabric.  Try using the same host file and see if that changes anything.

I'm also curious why there is no indication of trying the ofa-v2-mlx4_0-1 provider.  Try setting I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u and send the debug output again.  For now, setting I_MPI_DEBUG=5 is sufficient for the information needed.  If necessary, we'll try a higher debug level later, but it's pretty rare that anything over 5 is needed.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Connectez-vous pour laisser un commentaire.