User Guide

Contents

Analyze MPI Workloads

With
Intel® Advisor
, you can analyze parallel tasks running on a cluster to examine performance of your MPI application. Use the Intel® MPI
gtool
with
mpiexec
or
mpirun
to invoke the
advisor
command and spawn MPI processes across the cluster.
You can analyze MPI applications
only
through the command line interface, but you can view the result through the standalone GUI, as well as the command line.

Tips

Consider the following when running collections for an MPI application:
  • Analysis data can be saved to a shared partition or to local directories on the cluster.
  • Only one processes' data can be viewed at a time.
  • Intel® Advisor
    saves collection results into a subdirectory under the
    Intel Advisor
    project directory. If you wish to collect and then view (in a separate session) data for more than one process, specify a new project directory when running new collection.
  • Specify one and the same project directory when running various
    Intel Advisor
    collections for the selected process.

MPI Implementations Support

You can use the
Intel Advisor
with the
Intel® MPI Library
and other MPI implementations, but be aware of the following details:
  • You may need to adjust the command examples in this section to work for non-Intel MPI implementations. For example, adjust commands provided for process ranks to limit the number of processes in the job.
  • An MPI implementation needs to operate in cases when there is the
    Intel Advisor
    process (
    advisor
    ) between the launcher process (
    mpiexec
    ) and the application process. This means that the communication information should be passed using environment variables, as most MPI implementations do.
    Intel Advisor
    does not work on an MPI implementation that tries to pass communication information from its immediate parent process.

Get Intel® MPI Library Commands

You can use
Intel Advisor
to generate the command line for collecting results on multiple MPI ranks. To do that,
  1. In
    Intel Advisor
    user interface, go to
    Project Properties
    >
    Analysis Target
    tab and select the analysis you want to generate the command line for. For example, go to
    Survey Analysis Types
    >
    Survey Hotspots Analysis
    to generate command line for the Survey analysis.
  2. Set properties to configure the analysis, if required.
  3. Select the
    Use MPI Launcher
    checkbox.
  4. Specify the MPI run parameters, ranks to profile, if required (for Intel MPI Library only), then copy the command line from
    Get command line
    text box to your clipboard.
You can generate command lines for modeling your MPI application performance with
Offload Modeling
scripts. Run the
collect.py
script with the
--dry-run
option:
advisor-python <APM>/collect.py <project-dir> [--config <config-file>] –-dry-run -- <application-name> [myApplication-options]
where:
  • <APM>
    is an environment variable for a path to
    Offload Modeling
    scripts. For Linux* OS, replace it with
    $APM
    , for Windows* OS, replace it with
    %APM%
    .
  • <project-dir>
    is the path/name of the project directory. If the project directory does not exist,
    Intel Advisor
    will create it.
  • <config-file>
    (optional) is a pre-defined TOML file and/or a path to a custom TOML configuration file with hardware parameters for performance modeling. For details about parameters for MPI, see Model MPI Application Performance to GPU.
The commands generated does not include the MPI-specific syntax, you need to add it manually before running the commands.

Intel® MPI Library Command Syntax

Use the
-gtool
option of
mpiexec
with Intel® MPI Library 5.0.2 and higher:
$ mpiexec –gtool “advisor --collect=<analysis-type> --project-dir=<project-dir>:<ranks-set>” -n <N> <application-name> [myApplication-options]
where:
  • <analysis-type>
    is one of the
    Intel Advisor
    analyses:
    • survey
      runs the target process and collects basic information about the hotspots.
    • tripcounts
      collects data on the loop trip counts.
    • dependencies
      collects information possible dependencies in your application, requires one of the following:
      • Loop ID(s) as an additional parameter (
        -mark-up-list=<loop-ID>
        ). Find the loop ID in the Survey report (
        --report=survey
        ) or using the
        Command Line
        link in the
        Intel® Advisor
        GUI
        Workflow tab
        .
      • Loop source location(s) in the format
        file1
        :
        line1
      • Annotations in the source code
    • map
      collects information about memory access patterns for the selected loops. Also requires loop
      IDs
      or
      source locations
      for the analysis.
    • suitability
      checks suitability of the parallel site that you want to insert into your target application. Requires
      annotations
      to be added into the source code of your application, and also requires recompilation in Debug mode.
    • projection
      models your application performance on an accelerator.
  • <ranks-set>
    is the set of MPI ranks to run the analysis for. Separate ranks with a comma, or use a dash "-" to set a range of ranks. Use
    all
    to analyze all the ranks.
  • <N>
    is the number of MPI processes to launch.
gtool
option of
mpiexec
allows you to select MPI ranks to run analyses for. This can decrease overhead.

Generic MPI Command Syntax

Use
mpiexec
with the
advisor
command to spawn processes across the cluster and collect data about the application.
Each process has a rank associated with it. This rank is used to identify the result data.
To collect performance or dependencies data for an MPI program with
Intel Advisor
, the general form of the
mpiexec
command is:
$ mpiexec -n <N> "advisor --collect=<analysis-type> --project-dir=<project-dir> --search-dir src:r=<source-dir>" myApplication [myApplication-options]
where:
  • <N>
    is the number of MPI processes to launch.
  • <project-dir>
    specifies the path/name of the project directory.
  • <analysis_type>
    is
    survey
    ,
    tripcounts
    ,
    map
    ,
    suitability
    ,
    dependencies
    , or
    projection
    .
  • <source-dir>
    is the path to the directory where annotated sources are stored.
This command profiles all MPI ranks.

Model
MPI Application Offload to GPU

You can model your MPI application performance on an accelerator to determine whether it can benefit from offloading to a target device.
For MPI applications, you can collect data only with
advisor
command line interface
.
You can run the performance modeling using only
advisor
command line interface or a combination of
advisor
and the
analyze.py
script. For example. to use
advisor
and
analyze.py
:
  1. Collect metrics for your application running on a host device with
    advisor
    command line interface. For example, using the Intel® MPI Library
    gtool
    with
    mpiexec
    :
    $ mpiexec –gtool “advisor --collect=<analysis-type> --project-dir=<project-dir>:<ranks-set>” -n <N> <application-name> [myApplication-options]
  2. Model performance of your application on a target device for a single rank:
    $ advisor-python <APM>/analyze.py <project-dir> --mpi-rank <n> [--options]
    where:
    • <APM> is an environment variable for a path to
      Offload Modeling
      scripts. For Linux* OS, replace it with $APM, for Windows* OS, replace it with %APM%.
    • <project-dir>
      specifies the path/name of the project directory.
    • <N>
      is the rank number to model performance for.
      Instead of
      --mpi-rank=<n>
      , you can specify path to rank folder in the project directory. For example:
      $advisor-python <APM>/analyze.py <project-dir>/rank.<n> [--options]
    Consider using
    --config=<config-file>
    option to set a pre-defined TOML file and/or a path to a custom TOML configuration file if you want to use custom hardware parameters for performance modeling and/or model performance for a multi-rank MPI applications. By default,
    Offload Modeling
    models performance for a single-rank MPI application on a
    gen11_icl
    target configuration.
Configure Performance Modeling for Multi-Rank MPI
By default,
Offload Modeling
is optimized to model performance for a single-rank MPI application. For multi-rank MPI applications, do
one
of the following:
Scale Target Device Parameters
By default,
Offload Modeling
assumes that one MPI process is mapped to one GPU tile. You can configure the performance model and map MPI ranks to a target device configuration.
  1. Create a new TOML file, for example,
    my_config.toml
    . Specify the
    Tiles_per_process
    parameter as follows:
    [scale] Tiles_per_process = <float>
    where
    <float>
    is a fraction of a GPU tile that corresponds to a single MPI process. It accepts values from 0.01 to 0.6. This parameter automatically adjusts:
    • the number of execution units (EU)
    • SLM, L1, L3 sizes and bandwidth
    • memory bandwidth
    • PCIe* bandwidth
  2. Save and close the file.
  3. Re-run the performance modeling with the custom TOML file:
    $ advisor-python <APM>/analyze.py <project-dir> --config my_config.toml --mpi-rank <n> [--options]
    If you run performance modeling with
    advisor
    , use the
    --custom-config=<path>
    option to specify a custom configuration file.
Ignore MPI Time
For multi-rank MPI workloads, time spent in MPI runtime can differ from rank to rank and cause differences in the whole application time and
Offload Modeling
projections. If MPI time is significant and you see the differences between ranks, you can exclude time spent in MPI routines from the analysis.
  1. Go to
    <install-dir>/perfmodels/accelerators/gen/configs
    .
  2. Open the
    performance_model.toml
    file for editing.
  3. Set the
    ignore_mpi_time
    parameter to 1.
  4. Save and close the file.
  5. Re-run the performance modeling with the default TOML file:
    $ advisor-python <APM>/analyze.py <project-dir> --mpi-rank <n> [--options]
In the report generated, all
per-application
performance modeling metrics are re-calculated based on application self time excluding time spent in MPI calls from the analysis. This should improve modeling across ranks.
This parameter affects only metrics for a whole program in the
Summary
tab. Metrics for individual regions are not recalculated.

View
Results

As a result of collection,
Intel Advisor
creates a number of result directories in the directory specified with
--project-dir
. The nested result directories are named as
rank.0, rank.1, ... rank.n
, where the numeric suffix
n
corresponds to the MPI process rank.
To view the performance or dependency results collected for a specific rank, you can either open a result project file (
*.advixeproj
) that resides in the
--project-dir
via the
Intel Advisor
GUI, or run the
Intel Advisor
CLI report:
$ advisor --report=<analysis-type> --project-dir=<project-dir>:<ranks-set>
You can view only one rank's results at a time.
For
Offload Modeling
,
you do not need to run the
--report
command. The
reports are generated automatically after you run performance modeling. You can either open a result project file (
*.advixeproj
) that resides in the
<project-dir>
using the
Intel Advisor
GUI or view an HTML report in the respective rank directory at
<project-dir>
/rank.
<n>
/e
<NNN>
/pp
<NNN>
/data.0
with your preferred browser.

Additional MPI Resources

For more details on analyzing MPI applications, see the
Intel MPI Library
and online MPI documentation on the
Intel® Developer Zone
at https://software.intel.com/content/www/us/en/develop/tools/mpi-library/get-started.html
Hybrid applications:
Intel MPI Library
and OpenMP* on the
Intel Developer Zone
at https://software.intel.com/content/www/us/en/develop/articles/hybrid-applications-intelmpi-openmp.html

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.