Performance issues with Omni Path

Performance issues with Omni Path

Hi all,

I installed two Omni Path Fabric cards on two Xeon Servers. 

Following the instructions present in this web site: https://software.intel.com/en-us/articles/using-intel-omni-path-architec... 

The performance tests in this link shows that the network achieved 100 Gb/s - (4194304           10       360.39       360.39       360.39     23276.25)

I the network i deployed i achieved half of this performance (     4194304           10       661.40       661.40       661.40     12683.17):  

Is there some configuration needed to achieve 100 Gb/s using Omni Path?

 

Here is the complete output of benchmark execute:

mpirun -PSM2 -host 10.0.0.3 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv : -host 10.0.0.1 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv

[silvio@phi03 ~]$ mpirun -PSM2 -host 10.0.0.3 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv : -host 10.0.0.1 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 2018 Update 1, MPI-1 part    
#------------------------------------------------------------
# Date                  : Fri Feb  2 11:14:01 2018
# Machine               : x86_64
# System                : Linux
# Release               : 3.10.0-693.17.1.el7.x86_64
# Version               : #1 SMP Thu Jan 25 20:13:58 UTC 2018
# MPI Version           : 3.1
# MPI Thread Environment: 

# Calling sequence was: 

# /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# Sendrecv

#-----------------------------------------------------------------------------
# Benchmarking Sendrecv 
# #processes = 2 
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000         1.92         1.92         1.92         0.00
            1         1000         1.85         1.85         1.85         1.08
            2         1000         1.84         1.84         1.84         2.17
            4         1000         1.84         1.84         1.84         4.35
            8         1000         1.76         1.76         1.76         9.10
           16         1000         2.07         2.07         2.07        15.44
           32         1000         2.06         2.07         2.07        30.98
           64         1000         2.02         2.02         2.02        63.46
          128         1000         2.08         2.08         2.08       123.26
          256         1000         2.11         2.11         2.11       242.41
          512         1000         2.25         2.25         2.25       454.30
         1024         1000         3.56         3.56         3.56       575.46
         2048         1000         4.19         4.19         4.19       976.91
         4096         1000         5.16         5.16         5.16      1586.69
         8192         1000         7.15         7.15         7.15      2290.80
        16384         1000        14.32        14.32        14.32      2288.44
        32768         1000        20.77        20.77        20.77      3154.69
        65536          640        26.08        26.09        26.09      5024.04
       131072          320        34.77        34.77        34.77      7538.32
       262144          160        53.03        53.03        53.03      9886.58
       524288           80        93.55        93.55        93.55     11208.78
      1048576           40       172.25       172.28       172.26     12173.26
      2097152           20       355.15       355.21       355.18     11808.02
      4194304           10       661.40       661.40       661.40     12683.17

# All processes entering MPI_Finalize

 

 

Thanks in advance!

Silvio

6 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项

Hello Silvio,

From your results it looks like that you use Xeon Phi nodes but not Xeon.
In general to improve Omni-Path bandwidth numbers on Xeon Phi you need to use more that 1 core or use BKMs described in OPA tuning guide (https://www.intel.com/content/dam/support/us/en/documents/network-and-i-..., for example at section "9.1 Mapping from MPI Processes to SDMA Engines")

Sendrecv benchmark does isend and irecv on each iteration so it is bidirectional benchmark and bidirectional bandwidth limit for Omni-Path is 25 Gbytes/sec. To be closer to this number I would suggest to use new thread-split model which is available with IMPI 2019 (https://software.intel.com/en-us/articles/intel-mpi-library-2019-technic...). To get more information about thread-split mode read section "4. Multiple Endpoints Support " from Developer Reference (should be placed at <install_path>/compilers_and_libraries_2018.1.163/linux/mpi_2019/doc/Developer_Reference.pdf).

Here is example of usage on IMB-MT (it is suite of multi-threaded benchmarks which can employ thread-split model):

source <install_path>/compilers_and_libraries_2018.1.163/linux/mpi_2019/intel64/bin/mpivars.sh release_mt

I_MPI_THREAD_RUNTIME=openmp OMP_NUM_THREADS=4 I_MPI_THREAD_SPLIT=1 mpiexec.hydra -n 2 -ppn 1 -hosts host1,host2 IMB-MT sendrecvmt -count 1000000 -thread_level multiple

#-----------------------------------------------------------------------------
# Benchmarking SendRecvMT
# #processes = 2 (threads: 4)
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
     16000000         1000      1231.05      1344.16      1298.92     24635.81
 

dear Mikhail Shiryaev,

Thanks a lot for answering this!

Yes. I am performing tests using two xeon phi nodes.

The execution of IMB-MT finished with errors on my cluster.

[silvio@phi05 mpi-benchmarks]$ source  /opt/intel/parallel_studio_xe_2018/compilers_and_libraries_2018/linux/mpi_2019/intel64/bin/mpivars.sh release_mt

[silvio@phi05 mpi-benchmarks]$ I_MPI_THREAD_RUNTIME=openmp OMP_NUM_THREADS=4 I_MPI_THREAD_SPLIT=1 mpiexec.hydra -n 2 -ppn 1 -hosts 10.0.0.5,10.0.0.6 IMB-MT sendrecvmt -count 1000000 -thread_level multiple

IMB-MT: /usr/lib64/libfabric.so.1: version `FABRIC_1.1' not found (required by /opt/intel/compilers_and_libraries_2018.1.163/linux/mpi_2019/intel64/lib/release_mt/libmpi.so.12)
[mpiexec@phi05] HYDU_sock_write (../../utils/sock/sock.c:418): write error (Bad file descriptor)
[mpiexec@phi05] HYD_pmcd_pmiserv_send_signal (../../pm/pmiserv/pmiserv_cb.c:253): unable to write data to proxy
IMB-MT: /usr/lib64/libfabric.so.1: version `FABRIC_1.1' not found (required by /opt/intel/compilers_and_libraries_2018.1.163/linux/mpi_2019/intel64/lib/release_mt/libmpi.so.12)

 

The operation system i am using is centos 7 which provides libfabric version 4.

Do i need to perform downgrade of libfabric.so?

I tried to install the version 1.1 but it misses infiniband/driver.h that i cound not find in any package

Hi Silvio,

You can check your libfabric version using "fi_info --version".
If you have old libfabric then you can download and build it manually (on node with OPA software stack, for example on worker node):

  1. git clone https://github.com/ofiwg/libfabric.git
  2. cd ./libfabric
  3. ./autogen.sh
  4. ./configure --prefix=<libfabric_install_path> --enable-psm2
  5. make clean && make all && make install
  6. source <...>/mpivars.sh release_mt
  7. export LD_LIBRARY_PATH=<libfabric_install_path>/lib/:${LD_LIBRARY_PATH}
  8. mpiexec.hydra ...

Hi Mikhail,

   Adding to this thread rather than starting over because it seems most germane to my issue. I am trying to enable Intel MPI 2019 on a system without a libfabric, and without any Omni-Path packages (since it's a Mellanox-based network). So, I built dependencies (because I get unresolved symbols from psm and psm2 at runtime despite disabling them during the libfabric build) from fresh clones of intel/psm and intel/opa-psm2, using the Centos 7.4 GCC (v4.8.5). I then built libfabric 1.8.0 with

./configure --prefix=/nopt/nrel/apps/centos/7.4 --enable-psm=no --enable-psm2=no --enable-sockets=yes --enable-verbs=yes --enable-mlx=/opt/mellanox/mxm --enable-udp=yes --enable-tcp=yes --enable-rxm=no --enable-mrail=no --enable-rxd=no --enable-bgq=no --enable-shm=yes --enable-rstream=no --enable-perf=no

Build goes fine, but then when I try a little MPI test, I get

srun --nodes=2 --ntasks=4 --time=5:00 --account=hpcapps ./test

/home/cchang/tests/IMPI/./test: /nopt/nrel/apps/centos/7.4/lib64/libfabric.so.1: version `FABRIC_1.1' not found (required by /nopt/nrel/apps/compilers/2018-11-19/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.1-nn7isnm3kmcdwixuelyzaedqcyisum4j/impi/2019.1.144/intel64/lib/release/libmpi.so.12)

/home/cchang/tests/IMPI/./test: /nopt/nrel/apps/centos/7.4/lib64/libfabric.so.1: version `FABRIC_1.1' not found (required by /nopt/nrel/apps/compilers/2018-11-19/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.1-nn7isnm3kmcdwixuelyzaedqcyisum4j/impi/2019.1.144/intel64/lib/release/libmpi.so.12)

/home/cchang/tests/IMPI/./test: /nopt/nrel/apps/centos/7.4/lib64/libfabric.so.1: version `FABRIC_1.1' not found (required by /nopt/nrel/apps/compilers/2018-11-19/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.1-nn7isnm3kmcdwixuelyzaedqcyisum4j/impi/2019.1.144/intel64/lib/release/libmpi.so.12)

/home/cchang/tests/IMPI/./test: /nopt/nrel/apps/centos/7.4/lib64/libfabric.so.1: version `FABRIC_1.1' not found (required by /nopt/nrel/apps/compilers/2018-11-19/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2019.1-nn7isnm3kmcdwixuelyzaedqcyisum4j/impi/2019.1.144/intel64/lib/release/libmpi.so.12)

Sure enough, I only see 1.0 in the libfabric binary:

[cchang@el2 IMPI]$ readelf -a /nopt/nrel/apps/centos/7.4/lib64/libfabric.so.1.2.3 | grep FABRIC
   272: 0000000000011da0   330 FUNC    GLOBAL DEFAULT   12 fi_log@@FABRIC_1.0
   273: 0000000000011d60    62 FUNC    GLOBAL DEFAULT   12 fi_log_enabled@@FABRIC_1.0
   274: 0000000000012350   763 FUNC    GLOBAL DEFAULT   12 fi_param_get@@FABRIC_1.0
   275: 000000000000f070   166 FUNC    GLOBAL DEFAULT   12 fi_freeinfo@@FABRIC_1.0
   276: 000000000000f120  1031 FUNC    GLOBAL DEFAULT   12 fi_getinfo@@FABRIC_1.0
   277: 0000000000011f20   299 FUNC    GLOBAL DEFAULT   12 fi_getparams@@FABRIC_1.0
   278: 0000000000012110   563 FUNC    GLOBAL DEFAULT   12 fi_param_define@@FABRIC_1.0
   279: 000000000000fa20   127 FUNC    GLOBAL DEFAULT   12 fi_fabric@@FABRIC_1.0
   280: 00000000000111c0  2518 FUNC    GLOBAL DEFAULT   12 fi_tostr@@FABRIC_1.0
   281: 000000000000fab0    53 FUNC    GLOBAL DEFAULT   12 fi_strerror@@FABRIC_1.0
   282: 0000000000012050    83 FUNC    GLOBAL DEFAULT   12 fi_freeparams@@FABRIC_1.0
   283: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS FABRIC_1.0
   284: 000000000000f530  1254 FUNC    GLOBAL DEFAULT   12 fi_dupinfo@@FABRIC_1.0
   285: 000000000000faa0     6 FUNC    GLOBAL DEFAULT   12 fi_version@@FABRIC_1.0
  110:   2 (FABRIC_1.0)    2 (FABRIC_1.0)    2 (FABRIC_1.0)    2 (FABRIC_1.0) 
  114:   2 (FABRIC_1.0)    2 (FABRIC_1.0)    2 (FABRIC_1.0)    2 (FABRIC_1.0) 
  118:   2 (FABRIC_1.0)    2 (FABRIC_1.0)    2 (FABRIC_1.0)    2 (FABRIC_1.0) 
  11c:   2 (FABRIC_1.0)    2 (FABRIC_1.0) 
  0x001c: Rev: 1  Flags: none  Index: 2  Cnt: 1  Name: FABRIC_1.0

This is the latest libfabric release, so how would I go about getting symbols compatible with Intel MPI 2019?

Thanks; Chris

OK, never mind, I was looking at an older file installed from a Centos RPM.

发表评论

登录添加评论。还不是成员?立即加入