User Guide

Contents

Analyzing MPI Workloads

Intel® Advisor
allows you to analyze parallel tasks running on a cluster, so you can examine performance of your MPI application. Use the Intel® MPI
gtool
with
mpiexec
or
mpirun
to invoke the
advixe-cl
command and spawn MPI processes across the cluster.
You can analyze MPI applications
only
through the command line interface, but you can view the result through the standalone GUI, as well as the command line.

Tips

Consider the following when running collections for an MPI application:
  • Analysis data can be saved to a shared partition or to local directories on the cluster.
  • Only one processes' data can be viewed at a time.
  • Intel® Advisor
    saves collection results into a subdirectory under the
    Intel Advisor
    project directory. If you wish to collect and then view (in a separate session) data for more than one process, specify a new project directory when running new collection.
  • Specify one and the same project directory when running various
    Intel Advisor
    collections for the selected process.

MPI Implementations Support

You can use the
Intel Advisor
with the
Intel® MPI Library
and other MPI implementations, but be aware of the following details:
  • You may need to adjust the command examples in this section to work for non-Intel MPI implementations. For example, adjust commands provided for process ranks to limit the number of processes in the job.
  • An MPI implementation needs to operate in cases when there is the
    Intel Advisor
    process (
    advixe-cl
    ) between the launcher process (
    mpiexec
    ) and the application process. This means that the communication information should be passed using environment variables, as most MPI implementations do.
    Intel Advisor
    does not work on an MPI implementation that tries to pass communication information from its immediate parent process.

Get Intel® MPI Library Command

You can use
Intel Advisor
to generate the command line for collecting results on multiple MPI ranks. To do that,
  1. In
    Intel Advisor
    user interface, go to
    Project Properties
    >
    Analysis Target
    tab and select the analysis you want to generate the command line for. For example, go to
    Survey Analysis Types
    >
    Survey Hotspots Analysis
    to generate command line for the Survey analysis.
  2. Set properties to configure the analysis, if required.
  3. Select the
    Use MPI Launcher
    checkbox.
  4. Specify the MPI run parameters, ranks to profile, if required (for Intel MPI Library only), then copy the command line from
    Get command line
    text box to your clipboard.
To generate command lines for modeling your MPI application performance with Offload Advisor, run the
collect.py
script with the
--dry-run
option:
advixe-python <APM>/collect.py <project-dir> [--config <config-file>] –-dry-run -- <application-name> [myApplication-options]
where:
  • <APM>
    is an environment variable for a path to Offload Advisor scripts. For Linux* OS, replace it with
    $APM
    , for Windows* OS, replace it with
    %APM%
    .
  • <project-dir>
    is the path/name of the project directory. If the project directory does not exist,
    Intel Advisor
    will create it.
  • <config-file>
    (optional) is a pre-defined TOML file and/or a path to a custom TOML configuration file with hardware parameters for performance modeling. For details about parameters for MPI, see Model MPI Application Performance to GPU.
The commands generated does not include the MPI-specific syntax, so you need to add it manually before running the commands.

Intel® MPI Library Command Syntax

Use the
-gtool
option of
mpiexec
with Intel® MPI Library 5.0.2 and higher:
$ mpiexec –gtool “advixe-cl --collect=<analysis-type> --project-dir=<project-dir>:<ranks-set>” -n <N> <application-name> [myApplication-options]
where:
  • <analysis-type>
    is one of the
    Intel Advisor
    analyses:
    • survey
      runs the target process and collects basic information about the hotspots.
    • tripcounts
      collects data on the loop trip counts.
    • dependencies
      collects information possible dependencies in your application, requires one of the following:
      • Loop ID(s) as an additional parameter (
        -mark-up-list=<loop-ID>
        ). Find the loop ID in the Survey report (
        --report=survey
        ) or using the
        Command Line
        link in the
        Intel® Advisor
        GUI
        Workflow tab
        (right under the button)
        .
      • Loop source location(s) in the format
        file1
        :
        line1
      • Annotations in the source code
    • map
      collects information about memory access patterns for the selected loops. Also requires loop
      IDs
      or
      source locations
      for the analysis.
    • suitability
      checks suitability of the parallel site that you want to insert into your target application. Requires
      annotations
      to be added into the source code of your application, and also requires recompilation in Debug mode.
  • <ranks-set>
    is the set of MPI ranks to run the analysis for. Separate ranks with a comma, or use a dash "-" to set a range of ranks. Use
    all
    to analyze all the ranks.
  • <N>
    is the number of MPI processes to launch.
gtool
option of
mpiexec
allows you to select MPI ranks to run analyses for. This can decrease overhead.

Generic MPI Command Syntax

Use
mpiexec
with the
advixe-cl
command to spawn processes across the cluster and collect data about the application.
Each process has a rank associated with it. This rank is used to identify the result data.
To collect performance or dependencies data for an MPI program with
Intel Advisor
, the general form of the
mpiexec
command is:
$ mpiexec -n <N> "advixe-cl --collect=<analysis-type> --project-dir=<project-dir> --search-dir src:r=<source-dir>" myApplication [myApplication-options]
where:
  • <N>
    is the number of MPI processes to launch.
  • <project-dir>
    specifies the path/name of the project directory.
  • <analysis_type>
    is
    survey
    ,
    tripcounts
    ,
    map
    ,
    suitability
    or
    dependencies
    .
  • <source-dir>
    is the path to the directory where annotated sources are stored.
This command profiles all MPI ranks.

Modeling
MPI Application Offload to GPU

Use the Offload Advisor to model your MPI application performance on an accelerator to determine whether it can benefit from offloading to a target device.
For MPI applications, you can collect data only with
advixe-cl
.
  1. Collect metrics for your application running on a host device with
    advixe-cl
    command line interface. For example, using the Intel® MPI Library
    gtool
    with
    mpiexec
    :
    $ mpiexec –gtool “advixe-cl --collect=<analysis-type> --project-dir=<project-dir>:<ranks-set>” -n <N> <application-name> [myApplication-options]
  2. Model performance of your application on a target device for a single rank:
    $ advixe-python <APM>/analyze.py <project-dir> --mpi-rank <n> [--options]
    where:
    • <APM> is an environment variable for a path to Offload Advisor scripts. For Linux* OS, replace it with $APM, for Windows* OS, replace it with %APM%.
    • <project-dir>
      specifies the path/name of the project directory.
    • <N>
      is the rank number to model performance for.
      Instead of
      --mpi-rank=<n>
      , you can specify path to rank folder in the project directory. For example:
      $advixe-python <APM>/analyze.py <project-dir>/rank.<n> [--options]
    Consider using
    --config=<config-file>
    option to set a pre-defined TOML file and/or a path to a custom TOML configuration file if you want to use custom hardware parameters for performance modeling and/or model performance for a multi-rank MPI applications. By default, Offload Advisor models performance for a single-rank MPI application on an integrated Intel® Processor Graphics Gen11.
Configure Performance Modeling for Multi-Rank MPI
By default, Offload Advisor is optimized to model performance for a single-rank MPI application. For multi-rank MPI applications, do
one
of the following:
Scale Target Device Parameters
By default, Offload Advisor assumes that one MPI process is mapped to one GPU tile. You can configure the performance model and map MPI ranks to a target device configuration.
  1. Create a new TOML file, for example,
    my_config.toml
    . Specify a
    Tiles_per_process
    parameter:
    Tiles_per_process = <float>
    where
    <float>
    is a fraction of a GPU tile that corresponds to a single MPI process. It accepts values from 0.01 to 0.6. This parameter automatically adjusts:
    • the number of execution units (EU)
    • SLM, L1, L3 sizes and bandwidth
    • memory bandwidth
    • PCIe* bandwidth
  2. Save and close the file.
  3. Re-run the performance modeling with the custom TOML file:
    $ advixe-python <APM>/analyze.py <project-dir> --config my_config.toml --mpi-rank <n> [--options]
Ignore MPI Time
For multi-rank MPI workloads, time spent in MPI runtime can differ from rank to rank and cause differences in the whole application time and Offload Advisor projections. If MPI time is significant and you see the differences between ranks, you can exclude time spent in MPI routines from the analysis.
  1. Go to
    <install-dir>/perfmodels/accelerators/gen/configs
    .
  2. Open the
    performance_model.toml
    file for editing.
  3. Set the
    ignore_mpi_time
    parameter to 1.
  4. Save and close the file.
  5. Re-run the performance modeling with the default TOML file:
    $ advixe-python <APM>/analyze.py <project-dir> --mpi-rank <n> [--options]
In the report generated, all
per-application
performance modeling metrics are re-calculated based on application self time excluding time spent in MPI calls from the analysis. This should improve modeling across ranks.
This parameter affects only metrics for a whole program in the
Summary
tab. Metrics for individual regions are not recalculated.

Viewing
Results

As a result of collection,
Intel Advisor
creates a number of result directories in the directory specified with
--project-dir
. The nested result directories are named as
rank.0, rank.1, ... rank.n
, where the numeric suffix
n
corresponds to the MPI process rank.
To view the performance or dependency results collected for a specific rank, you can either open a result project file (
*.advixeproj
) that resides in the
--project-dir
via the
Intel Advisor
GUI, or run the
Intel Advisor
CLI report:
$ advixe-cl --report=<analysis-type> --project-dir=<project-dir>:<ranks-set>
You can view only one rank's results at a time.
For the
Offload Advisor
, the modeling results are located in the respective rank directory at
<project-dir>/rank.n/perf_models/mNNNN
. You can view an HTML report with your preferred browser.
For result overview, see Performance Predictor Output Overview.

Additional MPI Resources

For more details on analyzing MPI applications, see the
Intel MPI Library
and online MPI documentation on the
Intel® Developer Zone
at https://software.intel.com/content/www/us/en/develop/tools/mpi-library/get-started.html
Other Intel® Developer Zone online resources that discuss usage of the
Intel® Parallel Studio XE
Cluster Edition with the
Intel MPI Library
:
Hybrid applications:
Intel MPI Library
and OpenMP* on the
Intel Developer Zone
at https://software.intel.com/content/www/us/en/develop/articles/hybrid-applications-intelmpi-openmp.html

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804