Introducing Intel® MPI Benchmarks

The Intel® MPI Benchmarks perform a set of MPI performance measurements for point-to-point and global communication operations for a range of message sizes. The generated benchmark data fully characterizes:

  • Performance of a cluster system, including node performance, network latency, and throughput
  • Efficiency of the MPI implementation used

Key Features

The Intel® MPI Benchmarks package consists of the following components:

  • IMB-MPI1 - benchmarks for MPI-1 functions
  • Two components covering MPI-2 functionality:
    • IMB-EXT - one-sided communications benchmarks
    • IMB-IO - input/output (I/O) benchmarks
  • Two components covering MPI-3 functionality:
    • IMB-NBC - non-blocking collectives benchmarks that provide measurements of the computation/communication overlap and of the pure communication time
    • IMB-RMA - Remote Memory Access (RMA) benchmarks that use passive target communication to measure one-sided communication
  • IMB-MT - eliminates most of the cross-thread synchronization points in the MPI workflow. Available for Intel® MPI Benchmarks 2019 only.

Each component corresponds to a separate executable file. You can run all of the supported benchmarks, or specify a single executable file in the command line to get results for a specific subset of benchmarks.

Read the Intel® MPI Benchmarks User's Guide for more information on all runtime options.

Prerequisites

Memory and Disk Space Requirements

The memory required to run the Intel MPI Benchmarks is determined by the number of active processes with the default settings (standard mode) or the maximum size of the MPI message with user-defined settings (optional mode). Those vary from static values of 80 MB or below to dynamic values of up to 8 MB * # of active processes. The Intel MPI Benchmarks User Guide has full descriptions for the memory requirements for each benchmark.

Software Requirements

To run the Intel® MPI Benchmarks, you need:

  • cpp, ANSI C compiler, gmake on Linux* OS or Unix* OS
  • Enclosed Microsoft Visual* C++ solutions as the basis for Microsoft Windows* OS
  • MPI installation, including a startup mechanism for parallel MPI programs

Launch the Intel MPI Benchmarks

Installing the benchmarks

The benchmarks are available for download at the Intel MPI Benchmarks GitHub repository.

They are also installed as part of:

  • Intel® MPI Library
  • Intel® Parallel Studio XE Cluster Edition

For more information on the installed files, see:

<install_dir>/imb/<version>/ReadMe_IMB.txt

where,

  • <install_dir> is the Intel Parallel Studio XE installation directory, by default C:\Program Files (x86)\IntelSWTools on Windows, and /opt/intel/ on Linux
  • <version> is the Intel MPI Benchmarks version

Building the benchmarks

Building the benchmarks on Linux* OS

To build the benchmarks on Linux, do the following:

host$ source <path to Intel Compilers directory>/bin/compilervars.sh intel64
host$ source <path to Intel MPI Library directory>/intel64/bin/mpivars.sh
host$ cd <path to Intel MPI Benchmarks directory>/src
host$ make -f make_ict

Building the benchmarks on Windows OS*

Use the enclosed solution files located in the component-specific subdirectories under the WINDOWS/ directory. Click on the respective .vcproj or .vcxproj project file and use the Microsoft* Visual Studio* menu to run the associated benchmark application.

  1. Check that the Include, Lib, and Path environment variables are set as follows:
    • %I_MPI_ROOT%\intel64\include
    • %I_MPI_ROOT%\intel64\lib
    • %I_MPI_ROOT%\mpi\intel64\bin
    The %I_MPI_ROOT% environment variable is set to the Intel MPI Library installation directory. The default installation directory is C:\Program Files (x86)\IntelSWTools\mpi\<version>, where <version> is the product version.
  2. Open the .vcproj or .vcxproj file for the component you would like to build. From the Visual Studio Project panel:
    1. Change the Solution Platforms dialog box to x64
    2. Change the Solution Configurations dialog box to Release
    3. Check other settings as required, for example:
      • General > Project Defaults
        • Set Character Set to Use Multi-Byte Character Set
      • C/C++ > General
        • Set Additional Include Directories to $(I_MPI_ROOT)\intel64\include
        • Set Warning Level to Level 1 (/W1)
      • C/C++ > Preprocessor
        • For the Preprocessor definitions within the Visual Studio projects, add the conditional compilation macros WIN_IMB and _CRT_SECURE_NO_DEPRECATE. Depending on the components you intend to use, add one or more of the following macros: MPI1, EXT, MPIIO, NBC, RMA.
      • Linker > Input
        • Set Additional Dependencies to $(I_MPI_ROOT)\intel64\lib\impi.lib. Make sure to add quotes.
  3. Use F7 or Build > Build Solution to create an executable

Running the benchmarks

To run the Intel® MPI Benchmarks, use the following command-line syntax:

host$ mpirun -np <P> IMB-<component> [arguments]

where

  • P is the number of processes. P=1 is recommended for all I/O and message passing benchmarks except the single transfer ones.
  • <component> is the component-specific suffix that can take MPI1, EXT, IO, NBC, and RMA values

By default, all benchmarks run on Q active processes defined as follows: Q=[1,] 2, 4, 8, ..., largest 2x.

Links

Intel MPI Benchmarks GitHub repository - download the benchmarks

Intel® MPI Benchmarks User Guide 2018 - more information on runtime options

Creating Custom Benchmarks for Intel® MPI Benchmarks 2019

Visit the Intel MPI Benchmarks GitHub repository to download the benchmarks.

For more complete information about compiler optimizations, see our Optimization Notice.

10 comments

Top
Raghu R.'s picture

Hi Gergana,

I have a couple of quick questions.

Under the section titled "Running the benchmarks", the first comment under that section it mentions that P=1.  Was that intended, or am I misinterpreting that statement? Obviously you don't mean we should run the benchmark with 1 MPI rank.

IMB-NBC is only measuring non-blocking collectives (of course as the name suggests). Do you have a benchmark that for the measure communication overlap with just non-blocking send and receive routines?

Is it necessary to set MPICH_ASYNC_PROGRESS=1 to get asynchronous message progress for the NBC benchmark?

Is it necessary to set this environment variable for non blocking send/receive routines to get maximum overlap?

Thanks!

Raghu

Scott W.'s picture

Is the IMB only for Intel MPI or does it work for MPICH2 as well? I currently use mpich2 on a GNU compiler.

Håkon B.'s picture

The data-check compile-time option seems poorly debugged.

To reproduce:

1. Compile IMB-MPI1 with data-check enabled (-DCHECK)

2. Create a msg_lengths file (for L in `seq 0 100`; do echo $L >> msg_len.txt; done)

3. Run with your favorite MPI implementation using two processes, the simples possible way, with the following arguments to IMB-MPI1: 

   -msglen msg_len.txt -iter 1 Exchange

and terrible things happens.

For example, with Open MPI and the command line:

mpirun -np 2 --mca btl self,sm ./IMB-MPI1 -msglen msg_len.txt -iter 1 Exchange

I get:

mpirun noticed that process rank 1 with PID 17473 on node x.y.z  exited on signal 11 (Segmentation fault).

 

 

Rashawn K.'s picture

I have built IMB 4.0 with GCC 4.4.7 and OpenMPI 1.8.1 on a Centos 6.4 Linux machine.  I have been testing the benchmark suite on this machine, running only two MPI processes.  All the benchmarks run, but I encounter a segmentation fault in the aggregate mode of Get_accumulate in IMB-RMA.  The text of the segmentation fault error is:

[gino:00612] Failing at address: (nil)
[gino:00612] [ 0] /lib64/libpthread.so.0[0x3ad0c0f710]
[gino:00612] [ 1] /usr/lib64/openmpi-1.8.1/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_progress_pending_acc+0x79)[0x7f0065cdb599]
[gino:00612] [ 2] /usr/lib64/openmpi-1.8.1/lib/openmpi/mca_osc_rdma.so(+0xac90)[0x7f0065cdbc90]
[gino:00612] [ 3] /usr/lib64/openmpi-1.8.1/lib/openmpi/mca_pml_ob1.so(+0xf9d1)[0x7f00667239d1]
[gino:00612] [ 4] /usr/lib64/openmpi-1.8.1/lib/openmpi/mca_pml_ob1.so(+0x11362)[0x7f0066725362]
[gino:00612] [ 5] /usr/lib64/openmpi-1.8.1/lib/openmpi/mca_btl_vader.so(+0x2c66)[0x7f0066d53c66]
[gino:00612] [ 6] /usr/lib64/openmpi-1.8.1/lib/libopen-pal.so.6(opal_progress+0x4a)[0x372c2283da]
[gino:00612] [ 7] /usr/lib64/openmpi-1.8.1/lib/libmpi.so.1(ompi_request_default_wait_all+0x23d)[0x372ca4039d]
[gino:00612] [ 8] /usr/lib64/openmpi-1.8.1/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_sendrecv_actual+0x116)[0x7f006568ee36]
[gino:00612] [ 9] /usr/lib64/openmpi-1.8.1/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_barrier_intra_two_procs+0x50)[0x7f00656988d0]
[gino:00612] [10] /usr/lib64/openmpi-1.8.1/lib/libmpi.so.1(MPI_Barrier+0x72)[0x372ca52742]
[gino:00612] [11] IMB-RMA[0x40dd6d]
[gino:00612] [12] IMB-RMA[0x4059d3]
[gino:00612] [13] IMB-RMA[0x4023b0]
[gino:00612] [14] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3ad041ed1d]
[gino:00612] [15] IMB-RMA[0x401ed9]

I have added /usr/lib64/openmpi-1.8.1/lib and /usr/lib64/openmpi-1.8.1/lib/openmpi paths to my LD_LIBRARY_PATH environment variable.

Has anyone else encountered an error like this?  And how did you resolve it?

Thank you,

Rashawn K.

 

Gergana S. (Intel)'s picture

Hey Mike,

I don't have access to an Ubuntu machine but I just tested this on a Windows 8 machine and the unzip worked fine.  I was using a local copy of WinZip rather than pkunzip, which is what you have.  If you have a link of where I can download the pkunzip software, let me know and I'll give that a try.

Regards,
~Gergana

Michael S.'s picture

The file imb-3.2.4-updated.tgz seems to be corrupt.  I opened it under UBUNTU 12.04 and Windows 8 using pkunzip and both were uphappy.  Would someone have a look at that and let me know what was found.

 

Thanks

Mike S
 

phonlawat k.'s picture

Although i edit like your recommendation, the problem still occur.

developer's picture

Hi guys,

you have an integer overflow bug in IMB for buffers over 2GB, in IMB_alloc_buf
for example in the code:


s_len = (size_t) init_size;

r_len = (size_t) c_info->num_procs*init_size;


c_info->num_procs and init_size are both ints. therefore the multiplication is 32 bit.
This is because the * operator has precedence over type-casting operator.
you wanted to do: r_len = c_info->num_procs*(size_t)init_size;

anonymous's picture

Looking for MPI programs that have heavy use of non-determinism.

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.