Introducing Intel® MPI Benchmarks

The Intel® MPI Benchmarks perform a set of MPI performance measurements for point-to-point and global communication operations for a range of message sizes. The generated benchmark data fully characterizes:

  • Performance of a cluster system, including node performance, network latency, and throughput
  • Efficiency of the MPI implementation used

Key Features

The Intel® MPI Benchmarks package consists of the following components:

  • Two components covering MPI-1 functionality:
    • IMB-MPI1 - benchmarks for MPI-1 functions
    • IMB-P2P – shared memory transport-oriented benchmarks for MPI-1 point-to-point communications
  • Two components covering MPI-2 functionality:
    • IMB-EXT - one-sided communications benchmarks
    • IMB-IO - input/output (I/O) benchmarks
  • Two components covering MPI-3 functionality:
    • IMB-NBC - non-blocking collectives benchmarks that provide measurements of the computation/communication overlap and of the pure communication time
    • IMB-RMA - Remote Memory Access (RMA) benchmarks that use passive target communication to measure one-sided communication
  • IMB-MT - eliminates most of the cross-thread synchronization points in the MPI workflow. Available for Intel® MPI Benchmarks 2019 only.

Each component corresponds to a separate executable file. You can run all of the supported benchmarks, or specify a single executable file in the command line to get results for a specific subset of benchmarks.

Read the Intel® MPI Benchmarks User's Guide for more information on all runtime options.


Memory and Disk Space Requirements

The memory required to run the Intel MPI Benchmarks is determined by the number of active processes with the default settings (standard mode) or the maximum size of the MPI message with user-defined settings (optional mode). Those vary from static values of 80 MB or below to dynamic values of up to 8 MB * # of active processes. The Intel MPI Benchmarks User Guide has full descriptions for the memory requirements for each benchmark.

Software Requirements

To run the Intel® MPI Benchmarks, you need:

  • cpp, ANSI C compiler, libstdc++-devel, gmake on Linux* OS or Unix* OS
  • Enclosed Microsoft Visual* C++ solutions as the basis for Microsoft Windows* OS
  • MPI installation, including a startup mechanism for parallel MPI programs

Launch the Intel MPI Benchmarks

Installing the benchmarks

The benchmarks are available for download at the Intel MPI Benchmarks GitHub repository.

They are also installed as part of:

  • Intel® MPI Library
  • Intel® Parallel Studio XE Cluster Edition

For more information on the installed files, see:



  • <install_dir> is the Intel Parallel Studio XE installation directory, by default C:\Program Files (x86)\IntelSWTools on Windows, and /opt/intel/ on Linux
  • <version> is the Intel MPI Benchmarks version

Building the benchmarks

Building the benchmarks on Linux* OS

To build the benchmarks on Linux, do the following:

host$ source <path to Intel Compilers directory>/bin/ intel64
host$ source <path to Intel MPI Library directory>/intel64/bin/
host$ cd <path to Intel MPI Benchmarks directory>/src
host$ make -f make_ict

Building the benchmarks on Windows OS*

Use the enclosed solution files located in the component-specific subdirectories under the WINDOWS/ directory. Click on the respective .vcproj or .vcxproj project file and use the Microsoft* Visual Studio* menu to run the associated benchmark application.

  1. Check that the Include, Lib, and Path environment variables are set as follows:
    • %I_MPI_ROOT%\intel64\include
    • %I_MPI_ROOT%\intel64\lib
    • %I_MPI_ROOT%\mpi\intel64\bin
    The %I_MPI_ROOT% environment variable is set to the Intel MPI Library installation directory. The default installation directory is C:\Program Files (x86)\IntelSWTools\mpi\<version>, where <version> is the product version.
  2. Open the .vcproj or .vcxproj file for the component you would like to build. From the Visual Studio Project panel:
    1. Change the Solution Platforms dialog box to x64
    2. Change the Solution Configurations dialog box to Release
    3. Check other settings as required, for example:
      • General > Project Defaults
        • Set Character Set to Use Multi-Byte Character Set
      • C/C++ > General
        • Set Additional Include Directories to $(I_MPI_ROOT)\intel64\include
        • Set Warning Level to Level 1 (/W1)
      • C/C++ > Preprocessor
        • For the Preprocessor definitions within the Visual Studio projects, add the conditional compilation macros WIN_IMB and _CRT_SECURE_NO_DEPRECATE. Depending on the components you intend to use, add one or more of the following macros: MPI1, EXT, MPIIO, NBC, RMA.
      • Linker > Input
        • Set Additional Dependencies to $(I_MPI_ROOT)\intel64\lib\impi.lib. Make sure to add quotes.
  3. Use F7 or Build > Build Solution to create an executable

Running the benchmarks

To run the Intel® MPI Benchmarks, use the following command-line syntax:

host$ mpirun -np <P> IMB-<component> [arguments]


  • P is the number of processes. P=1 is recommended for all I/O and message passing benchmarks except the single transfer ones.
  • <component> is the component-specific suffix that can take MPI1, EXT, IO, NBC, and RMA values

By default, all benchmarks run on Q active processes defined as follows: Q=[1,] 2, 4, 8, ..., largest 2x.


Intel MPI Benchmarks GitHub repository - download the benchmarks

Intel® MPI Benchmarks User Guide - more information on runtime options

Creating Custom Benchmarks for Intel® MPI Benchmarks 2019

Software Products Home

The Complete Open-Source and Business Software Platform

Intel Cluster Studio XE

Visit the Intel MPI Benchmarks GitHub repository to download the benchmarks.

For more complete information about compiler optimizations, see our Optimization Notice.



Hi Gergana,

I have a couple of quick questions.

Under the section titled "Running the benchmarks", the first comment under that section it mentions that P=1.  Was that intended, or am I misinterpreting that statement? Obviously you don't mean we should run the benchmark with 1 MPI rank.

IMB-NBC is only measuring non-blocking collectives (of course as the name suggests). Do you have a benchmark that for the measure communication overlap with just non-blocking send and receive routines?

Is it necessary to set MPICH_ASYNC_PROGRESS=1 to get asynchronous message progress for the NBC benchmark?

Is it necessary to set this environment variable for non blocking send/receive routines to get maximum overlap?



Is the IMB only for Intel MPI or does it work for MPICH2 as well? I currently use mpich2 on a GNU compiler.

The data-check compile-time option seems poorly debugged.

To reproduce:

1. Compile IMB-MPI1 with data-check enabled (-DCHECK)

2. Create a msg_lengths file (for L in `seq 0 100`; do echo $L >> msg_len.txt; done)

3. Run with your favorite MPI implementation using two processes, the simples possible way, with the following arguments to IMB-MPI1: 

   -msglen msg_len.txt -iter 1 Exchange

and terrible things happens.

For example, with Open MPI and the command line:

mpirun -np 2 --mca btl self,sm ./IMB-MPI1 -msglen msg_len.txt -iter 1 Exchange

I get:

mpirun noticed that process rank 1 with PID 17473 on node x.y.z  exited on signal 11 (Segmentation fault).



gsslavov's picture

The Integer overflow bug has been fixed in the latest IMB 4.0.

Rashawn, I would recommend you post your question to the Intel Clusters and HPC Technology forum:  That's a better location for support questions.  The first thing I would check is if OpenMPI 1.8.1 supports the new one-sided MPI-3 routines (implemented in IMB-RMA).


I have built IMB 4.0 with GCC 4.4.7 and OpenMPI 1.8.1 on a Centos 6.4 Linux machine.  I have been testing the benchmark suite on this machine, running only two MPI processes.  All the benchmarks run, but I encounter a segmentation fault in the aggregate mode of Get_accumulate in IMB-RMA.  The text of the segmentation fault error is:

[gino:00612] Failing at address: (nil)
[gino:00612] [ 0] /lib64/[0x3ad0c0f710]
[gino:00612] [ 1] /usr/lib64/openmpi-1.8.1/lib/openmpi/[0x7f0065cdb599]
[gino:00612] [ 2] /usr/lib64/openmpi-1.8.1/lib/openmpi/[0x7f0065cdbc90]
[gino:00612] [ 3] /usr/lib64/openmpi-1.8.1/lib/openmpi/[0x7f00667239d1]
[gino:00612] [ 4] /usr/lib64/openmpi-1.8.1/lib/openmpi/[0x7f0066725362]
[gino:00612] [ 5] /usr/lib64/openmpi-1.8.1/lib/openmpi/[0x7f0066d53c66]
[gino:00612] [ 6] /usr/lib64/openmpi-1.8.1/lib/[0x372c2283da]
[gino:00612] [ 7] /usr/lib64/openmpi-1.8.1/lib/[0x372ca4039d]
[gino:00612] [ 8] /usr/lib64/openmpi-1.8.1/lib/openmpi/[0x7f006568ee36]
[gino:00612] [ 9] /usr/lib64/openmpi-1.8.1/lib/openmpi/[0x7f00656988d0]
[gino:00612] [10] /usr/lib64/openmpi-1.8.1/lib/[0x372ca52742]
[gino:00612] [11] IMB-RMA[0x40dd6d]
[gino:00612] [12] IMB-RMA[0x4059d3]
[gino:00612] [13] IMB-RMA[0x4023b0]
[gino:00612] [14] /lib64/[0x3ad041ed1d]
[gino:00612] [15] IMB-RMA[0x401ed9]

I have added /usr/lib64/openmpi-1.8.1/lib and /usr/lib64/openmpi-1.8.1/lib/openmpi paths to my LD_LIBRARY_PATH environment variable.

Has anyone else encountered an error like this?  And how did you resolve it?

Thank you,

Rashawn K.


gsslavov's picture

Hey Mike,

I don't have access to an Ubuntu machine but I just tested this on a Windows 8 machine and the unzip worked fine.  I was using a local copy of WinZip rather than pkunzip, which is what you have.  If you have a link of where I can download the pkunzip software, let me know and I'll give that a try.


The file imb-3.2.4-updated.tgz seems to be corrupt.  I opened it under UBUNTU 12.04 and Windows 8 using pkunzip and both were uphappy.  Would someone have a look at that and let me know what was found.



Mike S

Although i edit like your recommendation, the problem still occur.

Hi guys,

you have an integer overflow bug in IMB for buffers over 2GB, in IMB_alloc_buf
for example in the code:

s_len = (size_t) init_size;

r_len = (size_t) c_info->num_procs*init_size;

c_info->num_procs and init_size are both ints. therefore the multiplication is 32 bit.
This is because the * operator has precedence over type-casting operator.
you wanted to do: r_len = c_info->num_procs*(size_t)init_size;

Looking for MPI programs that have heavy use of non-determinism.

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.