This article guides you through creation of new benchmarks and benchmark suites within the Intel® MPI Benchmarks 2019 infrastructure.
A benchmark suite is a logically connected group of benchmarks. For each suite, you can declare command line arguments and share data structures.
To create a new benchmark suite:
srcdirectory of the Intel® MPI Benchmarks directory using this name. For example, if the benchmark suite name is
new_bench, the source code sub-tree will be the following:
Makefile. A simple
Makefilemay look as follows:
In this example, the
BECHMARK_SUITE_SRC += new_bench/new_bench_1.cpp CPPFLAGS += -Inew_bench
Makefilerules add the new benchmark source code into the full list of files to build and add the
new_benchsubdirectory to the search list of the included directories via the
.mkextension in the benchmark suite subdirectory:
You can find a benchmark suite
example in the example subdirectory of your Intel® MPI Benchmarks distribution.
This file contains the bare minimum required to introduce a new benchmark suite and a new benchmark to the benchmarking infrastructure. Two main entities must be correctly specified: a benchmark suite class and a benchmark class.
In this example, the new benchmark suite class is specified by the
DECLARE_BENCHMARK_SUITE_STUFF macro, which specializes the
BenchmarkSuite<> template class with the
BS_GENERIC enum value. Using the marco is recommended for simple cases like this.
Please note that there is a side effect of using the
BenchmarkSuite<BS_GENERIC> template: multiple instantiations of this class in different parts of the source code tree cause linker errors. To avoid this, use a unique namespace for all custom benchmark suites, and custom benchmark data structures and functions. In the example, the
example_suite1 namespace is used exactly for this purpose.
A new benchmark class must be inherited from the
Benchmark base class and must overload at least one virtual function:
void run(const scope_item &item). This is the core of any benchmark. There are two helper macros
DECLARE_INHERITED that define all static variables for the automatic runtime registration of any benchmark in the source tree.
You can check in runtime that
example1 benchmark appears in the benchmark list of the Intel® MPI Benchmarks with the
–list option. The option output also shows that it belongs to our
example_suite1 can be successfully integrated into the Intel® MPI Benchmarks infrastructure, but to actually run the benchmark, you need to define another main entity of the infrastructure for it.
The Intel® MPI Benchmarks infrastructure automatically registers new benchmarks and benchmark suites in the source tree, but the infrastructure must also know how many times each benchmark should be run and which parameters it should be passed each run. This is done by creating an object of an abstract class
Scope. The smart pointer to this object belongs to each benchmark object as a member and is supposed to be created by benchmark’s
init() virtual function definition.
The example example_benchmark2.cpp introduces the
void init() member function, which initializes the scope member of the base class
VarLenScope class, which is derived from the abstract base class Scope, creates a benchmarking scope of all messages or problem lengths from the set: 20, 21,…,222. The Intel® MPI Benchmarks infrastructure uses the scope initialized this way to run the benchmark by calling
void run(const scope_item &item) virtual function for each scope item. In this example, each scope item represents a single message length.
The third example extends
example_benchmark2.cpp with a simple and close to real world example of an MPI benchmark and implements the well-known ping-pong pattern. The
void init() virtual function adds receive and send buffers allocation. The
void finalize() virtual function implements the summary results output. The virtual destructor takes care of buffers deallocation.
The fourth example adds command line parameter handling to the previous ping-pong example. There are three command line parameters:
–lentakes a comma-separated list of message lengths to run the benchmark with
–datatypeallows you to select the datatype used in MPI messages:
–ncyclesdefines the number of benchmark iterations to execute during each
To set up the descriptions of expected command line arguments, the
bool declare_args() function of the
BenchmarkSuite<BS_GENERIC> template class is specified. It uses the
args_parser class API to declare options names that are expected to be parsed and option arguments that are meant. For example, the following API call:
instructs the command line parser to expect the
–ncycles option with an integer argument, the default argument value being 1000. The call:
parser.add_vector<int>("len", "1,2,4,8"). set_mode(args_parser::option::APPLY_DEFAULTS_ONLY_WHEN_MISSING);
instructs the command line parser to expect the
–len option with a comma-separated list of integers as an argument. The number of integers in the list is arbitrary. The default list consists of 4 integers: 1, 2, 4 and 8, and the nesting
set_mode() call makes the parser apply defaults only when the option is missing from the launch command line.
In this example the
bool prepare() function is used to handle the options and transfer data, given by the user on the command line, to internal data structures with corresponding
parser.get<>() calls . In particular, the
vector<int> len variable stores the list of desired message lengths received from the command line parser,
MPI_Datatype datatype stores the chosen data type,
int ncycles stores the given number of iterations.
get_parameter() function specialization implements an interface to pass pointers to data structures from the benchmark suite class to the benchmark class. Any benchmark in this suite may call the
get_parameter() function to get a smart pointer to a particular variable. The benchmark suite passes the pointer to the variable via the type erasure template class
any. In this example, both the
init() virtual functions of the benchmark class use this interface to get pointers to
ncycles values. The
GET_PARAMETER macros make the pointer pass handier.
Now the benchmark parameters may be controlled at runtime on the command line. When this example is compiled into a benchmark infrastructure, the command line option parser recognizes the
–ncycles options. The help output contains information on these options, which is integrated automatically.
This example implements the same functionality as the
example_benchmark1.cpp but with minimum usage of predefined macros and template classes.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804