User Guide

Contents

advisor
Command Option Reference

The
advisor
command currently supports the options shown below.
Option
Description
Add loops (by file and line number) to the loops selected for deeper analysis.
Specify the directory where the target application runs during analysis, if it is different from the current working directory.
Assume that a loop has dependencies if the loop dependency type is unknown.
Estimate invocation taxes assuming the invocation tax is paid only for the first kernel launch.
When searching for an optimal N-dimensional offload, assume there are dependencies between inner and outer loops.
Assume data is only transferred once for each offload, and all instances share that data.
Finalize Survey and Trip Counts & FLOP analysis data after collection is complete.
Emulate the execution of more than one instance simultaneously for a top-level offload.
Run benchmarks on only one concurrently executing Intel Advisor instance to avoid concurrency issues with regard to platform limits.
Generate a Survey report in bottom-up view.
Enable binary visibility in a read-only snapshot you can view any time.
Set the cache hierarchy to collect modeling data for CPU cache behavior during Trip Counts & FLOP analysis.
Enable source code visibility in a read-only snapshot you can view any time (with the
--snapshot
action). Enable keeping source code cache within a project (with the
--collect
action).
Enable cache simulation for Performance Modeling.
Set the cache associativity for modeling CPU cache behavior during the Memory Access Patterns analysis.
Set the cache line size (in bytes) for modeling CPU cache behavior during Memory Access Patterns analysis.
Set the focus for modeling CPU cache behavior during Memory Access Patterns analysis.
Set the cache set size (in bytes) for modeling CPU cache behavior during Memory Access Patterns analysis.
Check the profitability of offload regions and add only profitable regions to a report.
Clear all loops previously selected for deeper analysis.
Specify a device configuration to model your application performance for.
Use the projection of x86 logical instructions to GPU logical instructions.
Project x86 memory instructions to GPU SEND/SENDS instructions.
Count the number of accesses to memory objects created by code regions.
Project x86 MOV instructions to GPU MOV instructions.
Select how to model SEND instruction latency.
Specify a scale factor to approximate a host CPU that is faster than the baseline CPU by this factor.
Set the delimiter for a report in CSV format.
Specify the ablosute path or name for a custom TOML configuration file with additional modeling parameters.
Limit the maximum amount (in MB) of raw data collected during Survey analysis.
Analyze potential data reuse between code regions.
Set the level of details for modeling data transfers during Characterization.
Estimate data transfers in details and latencies for each transferred object.
Specify memory page size to set the traffic measurement granularity for the data transfer simulator.
Show only floating-point data, only integer data, or data for the sum of both data types in a Roofline interactive HTML report.
Remove previously collected trip counts data when re-running a Survey analysis with changed binaries.
Do not account for optimized traffic for transcendentals on a GPU.
Show a callstack for each loop/function call in a report.
Specify the maximum amount of time (in seconds) an analysis runs.
Show (in a Survey report) how many instructions of a given type actually executed during Trip Counts & FLOP analysis.
Enable job batching for a top-level offload.
Model CPU cache behavior on your target application.
Model data transfer between host memory and device memory.
Enable a simulator to model GRF.
Model SLM in the memory hierarchy model.
Examine specified annotated sites for opportunities to perform task-chunking modeling in a Suitability report.
Emulate data distribution over stacks if stacks collection is disabled.
Estimate region speedup with relaxed constraints.
Consider loops recommended for offloading only if they reach the minimum estimated speedup specified in a configuration file.
Exclude the specified files or directories from annotation scanning during analysis.
Specify an application for analysis that is not the starting application.
Filter data by the specified column name and value in a Survey and Trips Counts & FLOP report.
Enable filtering detected stack variables by scope (warning vs. error) in a Dependencies analysis.
Mark all potential reductions by specific diagnostic during Dependencies analysis.
Enable flexible cache simulation to change cache configuration without re-running collection.
Collect data about floating-point and integer operations, memory traffic, and mask utilization metrics for AVX-512 platforms during Trip Counts & FLOP analysis.
Consider all arithmetic operations as single-precision floating-point or int32 operations.
Consider all arithmetic operations as double-precision floating-point or int64 operations.
Set a report output format.
Create a Roofline interactive HTML report for data collected on GPUs.
Collect memory traffic generated by OpenCL™ and Intel® Media SDK programs executed on Intel® Processor Graphics.
Model performance only for code regions running on a GPU.
Specify time interval, in milliseconds, between GPU samples during Survey analysis.
Specify runtimes or libraries to ignore time spent in these regions when calculating per-program speedup.
Ignore mismatched target or application parameter errors before starting analysis.
Ignore mismatched module checksums before starting analysis.
Analyze the Nth child process during Memory Access Patterns and Dependencies analysis.
Model traffic on all levels of the memory hierarchy for a Roofline report.
Set the length of time (in milliseconds) to wait before collecting each sample during Survey analysis.
Collect data for applications with Data Parallel C++, OpenMP* target, and OpenCL™ code running on CPU.
Set the maximum number of top items to show in a report.
Set the maximum number of instances to analyze for all marked loops.
Specify total time, in milliseconds, to filter out loops that fall below this value.
Select loops (by criteria instead of human input) for deeper analysis.
Enable/disable user selection as a way to control loops/functions identified for deeper analysis.
After running a Survey analysis and identifying loops of interest, select loops (by file and line number or ID) for deeper analysis.
Model specific memory level(s) in a Roofline interactive HTML report, including L1, L2, L3, and DRAM.
Model only load memory operations, store memory operations, or both, in a Roofline interactive HTML report.
Show dynamic or static instruction mix data in a Survey report.
Collect Intel® oneAPI Math Kernel Library (oneMKL) loops and functions data during the Survey analysis.
Analyze child loops of the region head to find if some of the child loops provide more profitable offload.
Model calls to math functions such as EXP, LOG, SIN, and COS as extended math instructions, if possible.
Analyze code regions with system calls considering they are separated from offload code and executed on a host device.
Specify application (or child application) module(s) to include in or exclude from analysis.
Limit, by inclusion or exclusion, application (or child application) module(s) for analysis.
Specify MPI process data to import.
Set the Microsoft* runtime environment mode for analysis.
When searching for an optimal N-dimensional offload, limit the maximum loop depth that can be converted to one offload.
Specify a text file containing command line arguments.
Enable asynchronous execution to overlap offload overhead with execution time.
Pack a snapshot into an archive.
Analyze OpenCL™ and oneAPI Level Zero programs running on Intel® Processor Graphics.
Show Intel® performance libraries loops and functions in
Intel® Advisor
reports.
Collect metrics about Just-In-Time (JIT) generated code regions during the Trip Counts and FLOP analysis.
Collect Python* loop and function data during Survey analysis.
Collect metrics for stripped binaries.
Specify the top-level directory where a result is saved if you want to save the collection somewhere other than the current working directory.
Minimize status messages during command execution.
Recalculate total time after filtering a report.
Enable heap allocation tracking to identify heap-allocated variables for which access strides are detected during Memory Access Patterns analysis.
Capture stack frame pointers to identify stack variables for which access strides are detected during Memory Access Patterns analysis.
Examine specified annotated sites for opportunities to reduce lock contention or find deadlocks in a Suitability report.
Examine specified annotated sites for opportunities to reduce lock overhead in a Suitability report.
Examine specified annotated sites for opportunities to reduce site overhead in a Suitability report.
Examine specified annotated sites for opportunities to reduce task overhead in a Suitability report.
Refinalize a survey result collected with a previous Intel® Advisor version or if you need to correct or update source and binary search paths.
Remove loops (by file and line number) from the loops selected for deeper analysis.
Redirect report output from stdout to another location.
Specify the PATH/name of a custom report template file.
Specify a directory to identify the running analysis.
Resume collection after the specified number of milliseconds.
Return the target exit code instead of the command line interface exit code.
Specify the location(s) for finding target support files.
Enable searching for an optimal N-dimensional offload.
Select loops (by file and line number, ID, or criteria) for deeper analysis.
Assume loops with specified IDs or source locations have a dependency.
Assume loops with specified IDs or source locations are parallel.
Show data for all available columns in a Survey report.
Show data for all available rows, including data for child loops, in a Survey report.
Show only functions in a report.
Show only loops in a report.
Show not-executed child loops in a Survey report.
Specify the total time threshold, in milliseconds, to filter out nodes that fall below this value from PDF and DOT Offload Modeling reports.
Sort data in ascending order (by specified column name) in a report.
Sort data in descending order (by specified column name) in a report.
Register flow analysis to calculate the number of consecutive load/store operations in registers and related memory traffic in bytes during Survey analysis.
Restructure the call flow during Survey analysis to attach stacks to a point introducing a parallel workload.
Perform advanced collection of callstack data during Roofline and Trip Counts & FLOP analysis.
Choose between online and offline modes to analyze stacks during Survey analysis.
Start executing the target application for analysis purposes, but delay data collection.
Statically calculate the number of specific instructions present in the binary during Survey analysis.
Specify processes and/or children for instrumentation during Survey analysis.
Collect a variety of data during Survey analysis for loops that reside in non-executed code paths.
Specify a device configuration to model cache for during Trip Counts collection.
Specify a target GPU to collect data for if you have multiple GPUs connected to your system.
Attach Survey or Trip Counts & FLOP collection to a running process specified by the process ID.
Attach Survey or Trip Counts & FLOP collection to a running process specified by the process name.
Specify the hardware configuration to use for modeling purposes in a Suitability report.
Specify the threading model to use for modeling purposes in a Suitability report.
Specify the number of parallel threads to use for offload heads.
Generate a Survey report in top-down view.
Set how to trace loop iterations during Memory Access Patterns analysis.
Configure collectors to trace MPI code and determine MPI rank IDs for non-Intel® MPI library implementations.
Attribute memory objects to the analyzed loops that accessed the objects.
Track accesses to stack memory.
Enable parallel data sharing analysis for stack variables during Dependencies analysis.
Collect loop trip counts data during Trip Counts & FLOP analysis.
Re-use configuration files as specified at a collection phase in addition to default and user-specified configuration files.
Specify a directory other than the project directory to save analysis results.
Maximize status messages during command execution.
Show call stack data in a Roofline interactive HTML report (if call stack data is collected).

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.