User Guide

Contents

Command Line Use Cases

Review typical
Intel® Advisor
scenarios with corresponding command lines to get quick results and grasp the overall idea of how you can use
Intel Advisor
perspectives using the command line interface (CLI).
The main advantage of using the
Intel Advisor
CLI instead of the GUI is you can perform analysis and collect data as part of an automated or background task, and then view the result in a CLI report (or in the GUI) at your convenience.
You can use
Intel Advisor
GUI to generate command lines for a selected configuration.

Get Vectorization Insights

Run the
Vectorization and Code Insights
perspective to achieve the best performance using vectorization, add SIMD parallelism to your code, and get code-specific recommendations for how to fix vectorization issues.
The
Vectorization and Code Insights
perspective includes the following steps:
  1. Run the Survey analysis to find hotspots and get performance data for your application:
    advisor --collect=survey --project-dir=./advi --search-dir src:r=./src -- ./bin/myApplication
    Survey analysis is enough for a basic Vectorization workflow. Analysis steps below are
    optional
    , but recommended if you want to get a detailed overview of your application performance.
  2. Determine the number of loop iterations and collect data about floating-point and integer operations:
    advisor --collect=tripcounts --flop --project-dir=./advi --search-dir src:r=./src -- ./bin/myApplication
  3. Mark up loops for deeper analysis:
    advisor --mark-up-loops --select=foo.cpp:34,bar.cpp:192 --project-dir=./advi --search-dir src:r=./srcj -- ./bin/myApplication
  4. Check for possible dependencies:
    advisor --collect=dependencies --project-dir=./advi --search-dir src:r=./src -- ./bin/myApplication
  5. Check memory access patterns:
    advisor --collect=map --project-dir=./advi --search-dir src:r=./sr c -- ./bin/myApplication
You can generate a report in different formats or open the project in the
Intel Advisor
GUI. For example, to generate a report for all data collected to a TXT file:
advisor --report=joined --project-dir=./advi --search-dir src:r=./src --format=text --report-output=./out/myResult.txt
Review the data in the report file, make recommended updates to your application, rebuild, and test.

Plot a Roofline Chart

Run a
CPU / Memory Roofline Insights
or a
GPU Roofline Insights
perspective to visualize actual performance against hardware-imposed performance ceilings and determine the main limiting factor (memory bandwidth or compute capacity). You can choose to generate a CPU Roofline or a GPU Roofline depending on what platform your application executes on.
The Roofline perspective includes the following steps:
  1. Run a Roofline analysis:
    • CPU Roofline:
      advisor --collect=roofline –-project-dir=./advi -- myApplication
      You can extend the CPU Roofline report with call stacks data with
      --stacks
      or collect data for all memory levels with
      --enable-cache-simulation
      .
    • GPU Roofline:
      advisor --collect=roofline --profile-gpu –-project-dir=./advi -- myApplication
  2. [Optional] Check memory access patterns to get a detailed information about memory usage:
    advisor --collect=map --project-dir=./advi --search-dir src:r=./src -- myApplication
  3. Generate an interactive HTML Roofline report:
    • CPU Roofline:
      advisor --report=roofline --project-dir=./advi --report-output=./out/roofline.html
    • GPU Roofline:
      advisor --report=roofline --gpu --project-dir=./advi --report-output=./out/roofline.html
Review the data in the HTML Roofline report to identify the main limiting factors. Make optimizations, rebuild your application, and test.

Prototype Threading Design

Run a
Threading
perspective to analyze, design, tune, and check threading design options for your application.
The Threading perspective includes the following steps:
  1. Run the Survey analysis to find hotspots and get performance data for your application:
    advisor --collect=survey --project-dir=./advi --search-dir src:r=./src -- myApplication
  2. [Optional] Determine the number of loop iterations and collect data about floating-point and integer operations:
    advisor --collect=tripcounts --flop --project-dir=./advi --search-dir src:r=./src -- myApplication
  3. Add annotations to the source code and rebuild the application.
  4. Collect suitability data.
    advisor --collect=suitability --project-dir=./advi --search-dir src:r=./src -- myApplication
    Annotations must be present in the source code for this collection to be successful.
  5. Check for possible dependencies for the annotated loops:
    advisor --collect=dependencies --project-dir=./advi --search-dir src:r=./src -- myApplication
You can generate a report in different formats or open the project in the
Intel Advisor
GUI. For example, to generate a report for all data collected to a TXT file:
advisor --report=joined --project-dir=./advi --search-dir src:r=./src --format=text --report-output=./out/myResult.txt
Review the data in the TXT report file, update the application using the chosen parallel coding constructs. Rebuild the application and test..

Model Offloading to Accelerator

Run the
Offload Modeling
perspective to identify high-impact opportunities to offload your application to a target platform.
There are several methods available to run the
Offload Modeling
perspective, which vary in terms of simplicity and flexibility. You can run the
Offload Modeling
using the
advisor
command line interface or dedicated Python* scripts.
In the commands below,
<APM>
is the
Intel Advisor
environment variable that points to the directory with
Offload Modeling
scripts. Replace it with
$APM
on Linux* OS or with
%APM%
on Windows* OS.
Method 1: Use
advisor
command line interface
This method is the most flexible and applicable to MPI applications. To use this method:
  1. Run
    collect.py
    with the
    --dry-run
    option to get command lines appropriate for your configuration:
    advisor-python <APM>/collect.py ./advi –-dry-run -- myApplication
    You will get several command lines for
    advisor
    collections.
    You can also use
    Intel Advisor
    GUI to generate command lines,
  2. Run the Survey analysis:
    advisor --collect=survey --stackwalk-mode=online --static-instruction-mix --project-dir=./advi -- myApplication
  3. Run the Trip Counts and FLOP analysis:
    advisor --collect=tripcounts --flop --stacks --enable-cache-simulation --data-transfer=light --target-device=gen11_gt2 --project-dir=./advi -- myApplication
  4. Model application performance on a default
    gen11_gt2
    target device:
    advisor --collect=projection --no-assume-dependencies --project-dir=./advi
    You can also run the performance modeling with the
    analyze.py
    script
    .
This workflow corresponds to a medium accuracy of the
Offload Modeling
perspective selected in the GUI.
Method 2: Run the
run_oa.py
script
This is the simplest method that you can use to run all collection steps with one script. It is less flexible and is only available for non-MPI applications. Run the script as follows:
advisor-python <APM>/run_oa.py ./advi -- myApplication
Method 3: Run the
collect.py
and
analyze.py
scripts
This method is simple, moderately flexible, but does not support MPI applications.
collect.py
automates profiling, while
analyze.py
implements performance modeling on a target device. For example:
  1. Run
    collect.py
    to collect application performance metrics:
    advisor-python <APM>/collect.py ./advi -- myApplication
  2. Run
    analyze.py
    to model application performance as if it is run on a target device (for example, a GPU):
    advisor-python <APM>/analyze.py ./advi
To collect the Dependencies data for non-parallel regions, use
--collect=full
option with
collect.py
.
Results
For
all
methods, once you have run the performance modeling, you can open the results in the
Intel Advisor
GUI or see CSV metric reports and an interactive HTML report generated in the
<project-dir>
/e
<NNN>
/pp
<NNN>
/data.0
. The HTML report contains a list of regions profitable for offloading and performance metrics, like offload data transfer traffic, estimated number of cycles on a target device, estimated speed-up, compute vs memory-bound characterization.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.