User Guide

Contents

analyze.py Options

This script allows you to run an analysis on profiling data and generate report results.

Usage

advixe-python <APM>/analyze.py
<project-dir>
[--options]
Replace
<APM>
with
$APM
on Linux* OS or
%APM%
on Windows* OS.

Options

The following table describes options that you can use with the
analyze.py
script.
Option
Description
<project-dir>
Required. Specify the path to the
Intel® Advisor Beta
project directory.
-h
--help
Show help message and exit.
--version
Display
Intel® Advisor Beta
version information.
-v
<verbose>
--verbose
<verbose>
Specify output verbosity level:
  • 1 - Show only error messages. This is the least verbose level.
  • 2 - Show warning and error messages.
  • 3 (default) - Show information, warning, and error messages.
  • 4 - Show debug, information, warning, and error messages. This is the most verbose level.
This option affects the console output and debug log, but does not affect logs and report results.
--no-cachesim
Disable cache simulation during collection. The model assumes 100% hit rate for cache.
Usage decreases analysis overhead.
--config
<config>
Specify a configuration file by absolute path or name. If you choose the latter, the model configuration directory is searched for the file first, then the current directory.
You can specify several configurations by using the option more than once.
-o
<output-dir>
--out-dir
<output-dir>
Specify the directory to put all generated files into. By default, results are saved in
<advisor-project>/perf_models/mNNNN
. If you specify an existing directory or absolute path, results are saved in this directory. The new directory is created if it does not exist. If you only specify the directory
<name>
, results are stored in
<advisor-project>/perf_models/<name>
.
-p
<output-name-prefix>
--out-name-prefix
<output-name-prefix>
Specify a string to be prepended to output result filenames.
--assume-parallel
Assume that a loop is parallel if the loop type is not known.
--no-assume-parallel
(default)
Assume that a loop has a dependency if the loop type is not known.
--set-parallel
[<IDs/source-locations>]
Assume loops are parallel if they have IDs or source locations from a specified comma-separated list. If the list is empty, assume all loops are parallel.
--set-dependency
option takes precedence over
--set-parallel
, so if a loop is listed in both, it is considered as having a dependency.
--set-dependency
[<IDs/source-locations>]
Assume loops have dependencies if they have IDs or source locations from the specified comma-separated list. If the list is empty, assume all loops have dependencies.
--set-dependency
option takes precedence over
--set-parallel
, so if a loop is listed in both, it is considered as having a dependency.
--non-accel-time-breakdown
Provide a detailed breakdown of non-offloaded parts of offloaded regions.
-l
[<file-name>:<line-number>]
--select-loops
[<file-name>:<line-number>]
Limit the analysis to specified loop nests determined by passing a topmost loop. The parameter must be a comma-separated list of source locations in the following format:
<file-name>:<line-number>
.
--loop-filter-threshold
<threshold>
Specify the loop filter threshold in seconds. The default is 0.02. Loop nests with total time less than the threshold are ignored.
--small-node-filter
<threshold>
Specify the total time threshold, in seconds, to filter out nodes in the
program_tree.dot
and
program_tree.pdf
that fall below this value. The default is 0.0.
--evaluate-min-speedup
Enable offload fraction estimation that reaches minimum speedup defined in a configuration file. Disabled by default.
--mpi-rank
<mpi-rank>
Use results for the specified MPI rank if multiple ranks were analyzed.
--model-children
(default)
Analyze child loops of the region head to find if some of the loops provide more profitable offload.
--no-model-children
Do not analyze child loops of the region head.
--check-profitability
(default)
Check the profitability of offloading regions. Only regions that can benefit from the increased speed are added to a report.
--no-check-profitability
Add all evaluated regions to a report, regardless of the profitability of offloading specific regions.
--use-collect-configs
Use configuration files from collection phase in addition to default and custom configuration files.
--no-use-collect-configs
(default)
Do not use configuration files from collection phase. Use only default and custom configuration files.
--model-system-calls
(default)
Analyze regions with system calls inside. The actual presence of system calls inside a region may reduce model accuracy.
--no-model-system-calls
Do not analyze regions that contain system calls.
--jit
Enable data collection and analysis for applications with DPC++, OpenMP* target, and OpenCL™ code on a base platform.
--enable-slm
Enable SLM modeling in the memory hierarchy model. Must be used both with
collect.py
and
analyze.py
.
--track-heap-objects
(default)
Attribute heap-allocated objects to the analyzed loops that accessed these objects. Enabling can increase collection overhead.
--no-track-heap-objects
Disable attributing heap-allocated objects to the analyzed loops that accessed the objects. Disabling can decrease collection overhead.
--model-extended-math
(default)
Model calls to math functions such as
EXP
,
LOG
,
SIN
, and
COS
as extended math instructions, if possible.
--no-model-extended-math
Disable modeling calls to math functions such as
EXP
,
LOG
,
SIN
, and
COS
as extended math instructions.
--search-n-dim
(default)
Enable search for optimal N-dimensional offload.
--no-search-n-dim
Disable search for optimal N-dimensional offload.
--force-32bit-arithmetics
Force all arithmetic operations to be considered single-precision FPs or int32.
--force-64bit-arithmetics
Force all arithmetic operations to be considered double-precision FPs or int64.
--enable-batching
Enable job batching for top-level offloads. Emulate the execution of more than one instance simultaneously. This option is equivalent to
--threads=<total-EU-count*threads-per-EU>
.
--disable-batching
(default)
Disable job batching for top-level offloads.
--threads
<number-of-threads>
Specify the number of parallel threads to use for offload heads.
-e
--enforce-offloads
Skip the profitability check, disable analyzing child loops and functions, and ensure that the rows marked for offload are offloaded even if offloading child rows is more profitable.
--no-enforce-offloads
(default)
Enable the profitability check and analyzing child loops and functions to find the most profitable offload entries.
--count-memory-instructions
(default)
Use the projection of x86 instructions with memory to GPU SEND/SENDS instructions.
--no-count-memory-instructions
Do not use the projection of x86 instructions with memory to GPU SEND/SENDS instructions.
--assume-ndim-dependency
(default)
When searching for an optimal N-dimensional offload, assume there are dependencies between inner and outer loops.
--no-assume-ndim-dependency
When searching for an optimal N-dimensional offload, assume there are no dependencies between inner and outer loops.
-m
<markup>
--markup
<markup>
Select markup_analyze, affecting which regions to mark up for data collection and analysis.
--count-mov-instructions
Use the projection of x86 MOV instructions to GPU MOV instructions.
--no-count-mov-instructions
(default)
Do not use the projection of x86 MOV instructions to GPU MOV instructions.
--disable-fp64-math-optimization
Disable accounting for optimized traffic for transcendentals on the GPU.
--no-stacks
Run data analysis without using callstacks data. You can use this option to avoid bad callstacks attributed data at the expense of accuracy.
--data-transfer-histogram
(default)
Estimate fine-grained data transfer and latencies for each object transferred and add a memory object histogram to a report.
This option requires
track-heap-objects
and
track-stack-accesses
to be enabled during collection.
--no-data-transfer-histogram
Disable fine-grained data transfer.
--assume-hide-taxes
Use an optimistic approach to estimate invocation taxes: hide all invocation taxes except the first one.
--assume-never-hide-taxes
(default)
Use a pessimistic approach to estimate invocation taxes: do not hide invocation taxes.
--assume-single-data-transfer
(default)
Assumed data is transferred once for each offload, and all instances share the data.
--no-assume-single-data-transfer
Assume each data object is transferred for every instance of an offload that uses it. This method assumes no data re-use between calls to the same kernel.
Examples
  • Run analysis with default configuration on the project in the
    ./advi
    directory. The generated output is saved to the default
    advi/perf_models/mNNNN
    directory.
    advixe-python $APM/analyze.py ./advi
  • Run analysis using the Intel® Processor Graphics Gen9 configuration for the specific loops of the
    ./advi
    project. Add both analyzed loops to the report regardless of their offloading profitability. The generated output is saved to the default
    advi/perf_models/mNNNN
    directory.
    advixe-python $APM/analyze.py ./advi --config gen9 --select-loops [foo.cpp:34,bar.cpp:192] --no-check-profitability
  • Run analysis for a custom configuration on the
    ./advi
    project. Mark up regions for analysis and assume a code region is parallel if its type is unknown. Save the generated output to the
    advi/perf_models/report
    directory.
    advixe-python $APM/analyze.py ./advi --config ./myConfig.toml --markup --assume-parallel --out-dir report

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804