User Guide

Contents

Run
Offload Modeling
Perspective from Command Line

Intel® Advisor
provides several methods to run the
Offload Modeling
perspective. These methods vary in simplicity and flexibility:
  • Run analyses with the
    advisor
    command line interface (CLI), which is the most flexible method. You can select what performance data you want to collect for your application and configure and run
    Intel Advisor
    analyses and performance modeling separately. This method supports MPI applications.
  • Collect performance data with
    collect.py
    and model performance on a target with
    analyze.py
    , which is a simple and moderately flexible method. You can customize both data collection and performance modeling.
  • Run
    run_oa.py
    , which is the simplest, but less flexible method. You can use this batch mode-like script to run collection and modeling analyses with a single command and a limited number of options.
You can run the Python* scripts with Python 3.6 or 3.7 or
advisor-python
command line interface of the
Intel Advisor
.
The Python script methods
do not
support MPI applications.

Prerequisites

Set
Intel Advisor
environment variables
with a
advisor-vars
script.
The script enables the
advisor
CLI,
advisor-python
command line tool, and the
APM
environment variable, which points to the directory with
Offload Modeling
scripts and simplifies their use.
In the commands below:
  • Replace
    <APM>
    with
    $APM
    on Linux OS or with
    %APM%
    on Windows OS.
  • Options in square brackets (
    [--
    <option>
    ]
    ) are recommended if you want to change how to collect data or model application performance.

Use advisor Command Line Interface

This method is the most flexible and can analyze MPI applications. You can generate command lines for your application and configuration with one of the following:
  • Run
    collect.py
    with the
    --dry-run
    option from CLI as follows:
    advisor-python
    <APM>
    /collect.py
    <project-dir>
    --dry-run --
    <target-application>
  • Generate command lines from the
    Intel Advisor
    GUI.
Copy the generated commands to the clipboard and run them one by one from the command line.
The commands generated might require you to add certain options and steps (for example, mark up) to complete the flow.
Run the perspective as follows:
  1. Run the Survey analysis to collect basic performance metrics:
    advisor --collect=survey --project-dir=
    <project-dir>
    --stackwalk-mode=online --static-instruction-mix --
    <target-application>
    [
    <target-options>
    ]
    where:
    • --stackwalk-mode=online
      is an option to analyze stacks during collection. Set to
      offline
      to analyze stacks
      after
      collection. The online mode is recommended.
    • --static-instruction-mix
      is an option to collect static instruction mix data. This option is recommended.
  2. Run the Trip Counts and FLOP analysis to analyze loop call count and model data transfers on the target device:
    advisor --collect=tripcounts --project-dir=
    <project-dir>
    --flop --enable-cache-simulation --target-device=
    <target>
    [--stacks] [--data-transfer=
    <mode>
    ] [--profile-jit] --
    <target-application>
    [
    <target-options>
    ]
    where:
    • --flop
      is an option to collect data about floating-point and integer operations, memory traffic, and mask utilization metrics for AVX-512 platforms.
    • --enable-cache-simulation
      is an option to enable modeling cache behavior.
    • --target-device=
      <target>
      is a specific target GPU to model cache for.
      Make sure to specify the same target device as for the
      --collect=projection --config=
      <config-file>
      .
    • --stacks
      is an option to enable advanced collection of call stack data.
    • --data-transfer=
      <mode>
      is an option to enable modeling data transfers between host and target devices. Use
      off
      (default) to disable data transfer modeling,
      light
      to model only data transfers,
      full
      to model data transfers, attribute memory objects, and track accesses to stack memory.
      Use with
      --enable-cache-simulation
      only.
    • --profile-jit
      is an option to analyze GPU-enabled code regions.
  3. Optional
    : Check for loop-carried dependencies:
    1. Mark loops for the Dependencies analysis to decrease overhead. Use the
      generic
      markup strategy to select only loops profitable for offloading:
      advisor --mark-up-loops --project-dir=
      <project-dir>
      --select markup=gpu_generic --
      <target-application>
      [
      <target-options>
      ]
      For more information about markup options, see Loop Markup to Minimize Overhead.
      The generic markup strategy is recommended if you have an application that is not GPU-enabled, and you want to run the Dependencies analysis for it.
    2. Run the Dependencies analysis for the marked loops:
      advisor --collect=dependencies --project-dir=
      <project-dir>
      --loop-call-count-limit=16 [--select=
      <string>
      ] [--filter-reductions] --
      <target-application>
      [
      <target-options>
      ]
      where:
      • --loop-call-count-limit=16
        is the maximum number of call instances to analyze assuming similar runtime properties over different call instances. This value is recommended.
      • --select=
        <string>
        selects loops for the analysis by loop IDs, source locations, criteria such as
        scalar
        ,
        has-issue
        , or
        markup=
        <markup-mode>
        . The recommended argument is
        --select markup=gpu_generic
        to select loops that are recommended to run on a target.
        Use this option if you did not run the
        --mark-up-loops --select=
        <string>
        to select loops or want to run the Dependencies analysis for a different set of loops.
      • --filter-reductions
        is an option to mark all potential reductions with a specific diagnostic.
    Information about loop-carried dependencies is important for modeling performance of scalar loops. See Check How Assumed Dependencies Affect Modeling.
  4. Model application performance with the projection analysis:
    advisor --collect=projection --project-dir=
    <project-dir>
    --config=
    <config>
    [--no-assume-dependencies] [--data-reuse-analysis] [--assume-hide-taxes] [--jit] [--custom-config=
    <path>
    ]
    where:
    • --config=
      <config>
      is a target GPU configuration to model performance for. For example,
      gen11_icl
      (default),
      gen12_dg1
      ,
      gen9_gt3
      .
    • --no-assume-dependencies
      is an option to assume that a loop does not have dependencies if a loop dependency type is unknown. The default is
      --assume-dependencies
      . Use
      --no-assume-dependencies
      if your application contains parallel and/or vectorized loops and you did not run the Dependencies analysis.
    • --data-reuse-analysis
      is an option to analyze potential data reuse between code regions when offloaded to a target GPU.
    • --assume-hide-taxes
      is an option to assume that an invocation tax is paid only for the first time a kernel is launched.
    • --custom-config=
      <path>
      is a path to a custom
      .toml
      configuration file with additional modeling parameters. For details, see Advanced Modeling Configurations.
    • --jit
      is an option to model performance of GPU-enabled code regions.
Example
Collect performance data, check for dependencies for potentially profitable loops, model application performance and data transfers on a Intel® Iris® X
e
MAX graphics (
gen12_dg1
configuration):
advisor --collect=survey --project-dir=./advi --stackwalk-mode=online --static-instruction-mix -- myApplication
advisor --collect=tripcounts --project-dir=./advi --flop --enable-cache-simulation --target-device=gen12_dg1 --stacks --data-transfer=light -- myApplication
advisor --mark-up-loops --project-dir=./advi --select markup=gpu_generic -- myApplication
advisor --collect=dependencies --project-dir=./advi --filter-reductions --loop-call-count-limit=16 -- myApplication
advisor --collect=projection --project-dir=./advi --config=gen12_dg1

Run the collect.py and analyze.py Scripts

collect.py
automates profiling and allows you to run all analysis steps in one command, while
analyze.py
models performance of your application on a target device. This method is simple, moderately flexible, but it
does not
support MPI applications.
Run the scripts as follows:
  1. Collect application performance metrics with
    collect.py
    :
    advisor-python
    <APM>
    /collect.py
    <project-dir>
    [--collect=
    <collect-mode>
    ] [--config=
    <config-file>
    ] [--markup=
    <markup-mode>
    ] [--data-transfer] [--jit] --
    <target>
    [
    <target-options>
    ]
    where:
    • --collect=
      <collect-mode>
      is an option to specify what data is collected for your application:
      • Use
        basic
        to collect only basic Survey, Trip Counts and FLOP.
      • Use
        refinement
        to collect only Dependencies.
      • Use
        full
        (default) to collect Survey, Trip Counts and FLOP, and Dependencies data.
      See Check How Dependencies Affect Modeling for details when you need to collect dependency data.
    • --config=
      <config-file>
      is a target GPU configuration to model performance for. For example,
      gen11_icl
      (default),
      gen12_dg1
      , or
      gen9_gt3
      .
      Make sure to specify the same configuration file for
      collect.py
      and for
      analyze.py
      .
    • --markup=
      <loop-type>
      is loops to collect Trip Counts and FLOP and/or Dependencies data for. This option decreases collection overhead. By default, it is set to
      generic
      to analyze only loops profitable for offloading.
    • --data-transfer
      enables modeling data transfers between host and device when offloaded to a target.
      Enabled
      by default.
    • --jit
      is an option to model performance of GPU-enabled code regions.
  2. Model performance of your application on a target GPU device with a selected configuration with
    analyze.py
    :
    advisor-python
    <APM>
    /analyze.py
    <project-dir>
    [--config=
    <config-file>
    ] [--assume-parallel] [--jit]
    where:
    • --config=
      <config-file>
      is a target GPU configuration to model performance for. For example,
      gen11_icl
      (default),
      gen12_dg1
      , or
      gen9_gt3
      .
      Make sure to specify the same configuration file for
      collect.py
      and for
      analyze.py
      .
    • --assume-parallel
      is an option to assume that a loop does not have dependencies if there is no information about the loop dependency type and you did not run the Dependencies analysis (with
      collect.py --collect=basic
      ). For details, see Check How Dependencies Affect Modeling.
    • --jit
      is an option to model performance of GPU-enabled code regions.
See collect.py Script and analyze.py Script reference for a full list of available options.
Example
Collect performance data and model application performance on a target GPU with the Intel® Iris® X
e
MAX graphics (
gen12_dg1
configuration) on Linux OS:
advisor-python $APM/collect.py ./advi --config=gen12_dg1 –- myApplication
advisor-python $APM/analyze.py ./advi --config=gen12_dg1

Run the run_oa.py Script

This method is the simplest, but less flexible, and it does not support analysis of MPI applications. You can use it to run all collection and modeling steps with one script.
Run the script as follows:
advisor-python
<APM>
/run_oa.py
<project-dir>
[--collect=
<collect-mode>
] [--config=
<config-file>
] [--markup=
<markup-mode>
] [--data-transfer] [--jit] --
<target>
[
<target-options>
]
where:
  • --collect=
    <collect-mode>
    is an option to specify what data is collected for your application:
    • Use
      basic
      to collect only basic Survey, Trip Counts and FLOP.
    • Use
      refinement
      to collect only Dependencies.
    • Use
      full
      (default) to collect Survey, Trip Counts and FLOP, and Dependencies data.
    See Check How Dependencies Affect Modeling for details when you need to collect dependency data.
  • --config=
    <config-file>
    is a target GPU configuration to model performance for. For example,
    gen11_icl
    (default),
    gen12_dg1
    , or
    gen9_gt3
    .
  • --markup=
    <loop-type>
    is loops to collect Trip Counts and FLOP and/or Dependencies data for. This option decreases collection overhead. By default, it is set to
    generic
    to analyze only loops profitable for offloading.
  • --data-transfer
    is an option to enable modeling data transfers between host and device when offloaded to a target.
    Enabled
    by default.
  • --jit
    is an option to model performance of GPU-enabled code regions.
See run_oa.py Script reference for a full list of available options.
Example
Run the full collection and modeling with the
run_oa.py
script with default
gen11_icl
configuration on Linux OS:
advisor-python $APM/run_oa.py ./advi -- myApplication

View the Results

Intel Advisor
provides several ways to work with the
Offload Modeling
results generated from the command line.
View Results in CLI
After you run Performance Modeling with
advisor --collect=projection
or
analyze.py
, the summary results are printed in a terminal or a command prompt. This summary report includes:
  • Description of a
    baseline
    platform where application performance was measured and a target platform for which the application performance was modeled
  • Executive binary name
  • Top metrics for measured and estimated (accelerated) application performance
  • Top regions recommended for offloading to the target and performance metrics per region
For example:
Info: Selected accelerator to analyze: Intel Gen9 GT2 Integrated Accelerator 24EU 1150MHz. Info: Baseline Host: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz, GPU: Intel (R) . Info: Binary Name: 'CFD'. Measured CPU Time: 44.858s Accelerated CPU+GPU Time: 15.425s Speedup for Accelerated Code: 3.8x Number of Offloads: 5 Fraction of Accelerated Code: 60% Top Offloaded Regions -------------------------------------------------------------------------------------------------------------------------------------------------- Location | Time on Baseline | Time on Target | Speedup | Bound by | Data Transfer -------------------------------------------------------------------------------------------------------------------------------------------------- [loop in compute_flux_ser at euler3d_cpu_ser.cpp:226] | 36.576s | 9.103s | 4.02x | L3_BW | 12.091MB [loop in time_step_ser at euler3d_cpu_ser.cpp:361] | 1.404s | 0.319s | 4.40x | L3_BW | 10.506MB [loop in compute_step_factor_ser at euler3d_cpu_ser.... | 0.844s | 0.158s | 5.35x | Compute | 4.682MB [loop in main at euler3d.cpp:848] | 1.046s | 0.906s | 1.15x | Dependency | 31.863MB [loop in Intel::OpenCL::TaskExecutor::in_order_execu... | 0.060s | 0.012s | 4.98x | Dependency | 0.303MB --------------------------------------------------------------------------------------------------------------------------------------------------
See Accelerator Metrics reference for more information about the metrics reported.
View Results in GUI
When you run
Intel Advisor
CLI or Python scripts, a project is created automatically in the directory specified with
--project-dir
. All the collected results and analysis configurations are stored in the
.advixeproj
project, that you can view in the
Intel Advisor
.
To open the project in GUI, you can run the following command:
advisor-gui <project-dir>
If the report does not open, click
Show Result
on the Welcome pane.
You first see a
Summary
report that includes the most important information about measured performance on a baseline platform and modeled performance on a target platform, including:
  • Main metrics for the modeled performance of your program that indicates if you should offload your application to a target device.
  • Specific factors that prevent your code from achieving a better performance if executed on a target device in the Offload Bounded by.
  • Top five offloaded loops/functions that provide the highest benefit and top five not offloaded loops/functions with the reason why they were not offloaded.
Offload Modeling Summary in GUI
View an Interactive HTML Report
When you run
Intel Advisor
CLI or Python scripts, an additional set of CSV metric reports and an interactive HTML report is generated in the
<project-dir>
/e
<NNN>
/pp
<NNN>
/data.0
directory. These reports are light-weighted and do not require
Intel Advisor
GUI.
The HTML report is similar to the GUI project, but it reports some additional metrics. The report contains a list of regions profitable for offloading and performance metrics, like offload data transfer traffic, estimated number of cycles on a target device, estimated speed-up, compute vs memory-bound characterization.
Offload Modeling HTML report
Save a Read-only Snapshot
A snapshot is a read-only copy of a project result, which you can view at any time. To save an active project result as a read-only snapshot:
advisor --snapshot --project-dir=
<project-dir>
[--cache-sources] [--cache-binaries] --
<snapshot-path>
where:
  • --cache-sources
    is an option to add application source code to the snapshot.
  • --cache-binaries
    is an option to add application binaries to the snapshot.
  • <snapshot-path
    is a path and a name for the snapshot. For example, if you specify
    /tmp/new_snapshot
    , a snapshot is saved in a
    tmp
    directory as
    new_snapshot.advixeexpz
    . You can skip this and save the snapshot to a current directory as
    snapshot
    XXX
    .advixeexpz
    .
To open the result snapshot in the
Intel Advisor
GUI, you can run the following command:
advisor-gui
<snapshot-path>
You can visually compare the saved snapshot against the current active result or other snapshot results.

Next Steps

See Identify Code Regions to Offload to understand the results. This section is GUI-focused, but you can still use to it for interpretation.
For details about metrics reported, see Accelerator Metrics.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.