Creating Custom Benchmarks for Intel® MPI Benchmarks 2019

By Nataliya Komleva,

Published:04/12/2018   Last Updated:09/17/2018

This article guides you through creation of new benchmarks and benchmark suites within the Intel® MPI Benchmarks 2019 infrastructure.

A benchmark suite is a logically connected group of benchmarks. For each suite, you can declare command line arguments and share data structures.

Initial Setup

To create a new benchmark suite:

  1. Choose a name for your new benchmark suite and create a new subdirectory in the src directory of the Intel® MPI Benchmarks directory using this name. For example, if the benchmark suite name is new_bench, the source code sub-tree will be the following:
  2. Create a Makefile. A simple Makefile may look as follows:
    BECHMARK_SUITE_SRC += new_bench/new_bench_1.cpp
    CPPFLAGS += -Inew_bench
    In this example, the Makefile rules add the new benchmark source code into the full list of files to build and add the new_bench subdirectory to the search list of the included directories via the  –I compiler flag.
  3. Save the Makefile with the .mk extension in the benchmark suite subdirectory:


You can find a benchmark suite example in the example subdirectory of your Intel® MPI Benchmarks distribution.


This file contains the bare minimum required to introduce a new benchmark suite and a new benchmark to the benchmarking infrastructure. Two main entities must be correctly specified: a benchmark suite class and a benchmark class.

Custom benchmark suite class

In this example, the new benchmark suite class is specified by the DECLARE_BENCHMARK_SUITE_STUFF macro, which specializes the BenchmarkSuite<> template class with the BS_GENERIC enum value. Using the marco is recommended for simple cases like this.

Please note that there is a side effect of using the BenchmarkSuite<BS_GENERIC> template: multiple instantiations of this class in different parts of the source code tree cause linker errors. To avoid this, use a unique namespace for all custom benchmark suites, and custom benchmark data structures and functions. In the example, the example_suite1 namespace is used exactly for this purpose.

Custom benchmark class

A new benchmark class must be inherited from the Benchmark base class and must overload at least one virtual function: void run(const scope_item &item). This is the core of any benchmark. There are two helper macros DEFINE_INHERITED and DECLARE_INHERITED that define all static variables for the automatic runtime registration of any benchmark in the source tree.

You can check in runtime that example1 benchmark appears in the benchmark list of the Intel® MPI Benchmarks with the –list option. The option output also shows that it belongs to our example_suite1 suite.


The example_suite1 can be successfully integrated into the Intel® MPI Benchmarks infrastructure, but to actually run the benchmark, you need to define another main entity of the infrastructure for it.

Custom benchmarking scope

The Intel® MPI Benchmarks infrastructure automatically registers new benchmarks and benchmark suites in the source tree, but the infrastructure must also know how many times each benchmark should be run and which parameters it should be passed each run. This is done by creating an object of an abstract class Scope. The smart pointer to this object belongs to each benchmark object as a member and is supposed to be created by benchmark’s init() virtual function definition.

The example example_benchmark2.cpp introduces the void init() member function, which initializes the scope member of the base class Benchmark. The VarLenScope class, which is derived from the abstract base class Scope, creates a benchmarking scope of all messages or problem lengths from the set: 20, 21,…,222. The Intel® MPI Benchmarks infrastructure uses the scope initialized this way to run the benchmark by calling void run(const scope_item &item) virtual function for each scope item. In this example, each scope item represents a single message length.


The third example extends example_benchmark2.cpp with a simple and close to real world example of an MPI benchmark and implements the well-known ping-pong pattern. The void init() virtual function adds receive and send buffers allocation. The void finalize() virtual function implements the summary results output. The virtual destructor takes care of buffers deallocation.


The fourth example adds command line parameter handling to the previous ping-pong example. There are three command line parameters:

  • –len takes a comma-separated list of message lengths to run the benchmark with
  • –datatype allows you to select the datatype used in MPI messages: MPI_CHAR or MPI_INT
  • –ncycles defines the number of benchmark iterations to execute during each run() call

To set up the descriptions of expected command line arguments, the  bool declare_args() function of the BenchmarkSuite<BS_GENERIC> template class is specified. It uses the args_parser class API to declare options names that are expected to be parsed and option arguments that are meant. For example, the following API call:

parser.add<int>("ncycles", 1000);

instructs the command line parser to expect the –ncycles option with an integer argument, the default argument value being 1000. The call:

parser.add_vector<int>("len", "1,2,4,8").

instructs the command line parser to expect the –len option with a comma-separated list of integers as an argument. The number of integers in the list is arbitrary. The default list consists of 4 integers: 1, 2, 4 and 8, and the nesting set_mode() call makes the parser apply defaults only when the option is missing from the launch command line.

In this example the  bool prepare() function is used to handle the options and transfer data, given by the user on the command line, to internal data structures with corresponding parser.get<>() calls . In particular, the vector<int> len variable stores the list of desired message lengths received from the command line parser, MPI_Datatype datatype stores the chosen data type, int ncycles stores the given number of iterations.

The get_parameter() function specialization implements an interface to pass pointers to data structures from the benchmark suite class to the benchmark class. Any benchmark in this suite may call the get_parameter() function to get a smart pointer to a particular variable. The benchmark suite passes the pointer to the variable via the type erasure template class any. In this example, both the run() and init() virtual functions of the benchmark class use this interface to get pointers to en, datatype and ncycles values. The HANDLE_PARAMETER and GET_PARAMETER macros make the pointer pass handier.

Now the benchmark parameters may be controlled at runtime on the command line. When this example is compiled into a benchmark infrastructure, the command line option parser recognizes the –len, -datatype and –ncycles options. The help output contains information on these options, which is integrated automatically.


This example implements the same functionality as the example_benchmark1.cpp but with minimum usage of predefined macros and template classes.

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804