User Guide

Contents

MPI Workflow Example

This section shows example workflows for analyzing MPI applications with
Intel® Advisor
. In the commands below:
  • Path to an application executable is
    <PATH>/mpi-sample/1_mpi_sample_serial
    .
  • Path to an
    Intel Advisor
    project directory is
    ./advi
    .
  • Performance is modeled for
    all
    MPI ranks.
    To reduce overhead, you can run performance modeling only for specific MPI ranks using
    gtool
    .

Analyze MPI Application Performance

This example shows how to run a Survey analysis to get a basic performance and vectorization report for an MPI application. The analysis is performed for an application that is run in four processes.
  1. Collect survey data for all ranks into the
    shared
    ./advi
    project directory.
    $ mpirun -n 4 “advixe-cl --collect=survey --project-dir=./advi” <PATH>/mpi-sample/1_mpi_sample_serial
    To collect survey data for a single rank (for example, rank 0), you can use the following command:
    $ mpirun -n 4 -gtool "advixe-cl --collect=survey --project-dir=./advi:0" <PATH>/mpi-sample/1_mpi_sample_serial
    If you need to copy the data to the development system, do so now.
  2. Import and finalize your data.
    $ advixe-cl –-import-dir=./advi --project-dir=./new-advi --mpi-rank=3 --search-dir src:=<PATH>/mpi_sample
    The
    --project-dir
    should be a different directory for finalized analysis results on the development system.
  3. Open the results in the
    Intel Advisor
    GUI.
    $ advixe-gui ./new-advi
You can proceed to run other analyses one by one. After you finish, you need to import and finalize result for an MPI rank of interest to be able to view it.
For a full vectorization workflow, see Analyze Vectorization and Memory Aspects of an MPI Application recipe in the Intel Advisor Cookbook.

Model MPI Application Performance on GPU

This example shows how to run
Offload Advisor
to get insights about your MPI application performance modeled on a GPU. In this example:
  • The analyses are performed for an application that is run in four processes.
  • Performance is modeled for Intel® HD Graphics 630 (
    gen9_gt2
    configuration)
To model performance:
  1. Generate command lines for performance collection:
    $ advixe-python $APM/collect.py ./advi --dry-run --config=gen9_gt2 -- <PATH>/mpi-sample/1_mpi_sample_serial
    For Windows* OS, replace
    $APM
    with
    %APM%
    .
  2. Copy the printed commands to the clipboard, add
    mpirun
    or
    mpiexec
    to each command, and run them one by one. Survey and Trip Counts and FLOP analyses are required, others are optional. For example, with
    mpirun
    :
    1. Collect survey data for
      all
      ranks into the
      shared
      ./advi
      project directory.
      $ mpirun -n 4 "advixe-cl --collect=survey --project-dir=./advi --return-app-exitcode --auto-finalize --static-instruction-mix --stackwalk-mode=online" <PATH>/mpi-sample/1_mpi_sample_serial
    2. Mark loops for
      all
      ranks for the next analysis using a pre-defined mark-up strategy:
      $ for x in ./advi/rank.*; do advixe-python $APM/collect.py $x --arch gen --markup generic; done
      If you want to model a single rank, you can provide a path to a specific rank results and run the mark-up as follows:
      advixe-python $APM/collect.py <project-dir>/rank.<n> --arch gen --markup generic
      You need to specify this path to all other analyses.
    3. Collect trip counts and FLOP data:
      $ mpirun -n 4 "advixe-cl --collect=tripcounts --project-dir=./advi --return-app-exitcode --flop --auto-finalize --ignore-checksums --stacks --enable-data-transfer-analysis --track-memory-objects --profile-jit --cache-sources --track-stack-accesses --enable-cache-simulation --cache-config=3:1w:4k/1:64w:512k/1:16w:8m" <PATH>/mpi-sample/1_mpi_sample_serial
      Cache configuration specified with
      --cache-config
      option is specific for a selected target device. Do not change the option value generated by
      collect.py --dry-run
      option.
    4. [Optional] Collect Dependencies data:
      $ mpirun -n 4 "advixe-cl --collect=dependencies --project-dir=./advi --return-app-exitcode --filter-reductions --loop-call-count-limit=16 --ignore-checksums" <PATH>/mpi-sample/1_mpi_sample_serial
  3. Run performance modeling for
    all
    MPI ranks of the application:
    $ for x in ./advi/rank.*; do advixe-python $APM/analyze.py $x --config=gen9_gt2 -o $x/perf_models; done
    The results are generated per rank in a
    ./advi/rank.X/perf_models
    directory. You can transfer them to the development machine and view the report.
    If you want to model a single rank, you can provide a path to a specific rank results or use
    --mpi-rank
    option.
For all analysis types
: When using a shared partition on Windows*, either the network paths must be used to specify the project and executable location, or the MPI options
mapall
or
map
can be used to specify these locations on the network drive.
For example:
$ mpiexec -gwdir \\<host1>\mpi -hosts 2 <host1> 1 <host2> 1 advixe-cl --collect=survey --project-dir=\\<host1>\mpi\advi -- \\<host1>\mpi\mpi_sample.exe
$ advixe-cl --import-dir=\\<host1>\mpi\advi --project-dir=\\<host1>\mpi\new-advi --search-dir src:=\\<host1>\mpi --mpi-rank=1
$ advixe-cl --report=survey --project-dir=\\<host1>\mpi\new-advi
Or:
$ mpiexec -mapall -gwdir z:\ -hosts 2 <host1> 1 <host2> 1 advixe-cl --collect=survey --project-dir=z:\advi -- z:\mpi_sample.exe
Or:
$ mpiexec -map z:\\<host1>\mpi -gwdir z:\ -hosts 2 <host1> 1 <host2> 1 advixe-cl --collect=survey --project-dir=z:\advi -- z:\mpi_sample.exe

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804