Execution Speed/Duration/Scope Properties to Minimize Analysis Overhead

Issue

Running your target application with the Intel Advisor can take substantially longer than running your target application without the Intel Advisor. For example:

Runtime Overhead / Analysis

Survey

Trip Counts & FLOP

Roofline

Dependencies

MAP

Target application runtime with Intel Advisor compared to runtime without Intel Advisor

1.1x longer

3 - 8x longer

3.1 - 8.1x longer

5 - 100x longer

5 - 20x longer

Solutions

Use the following techniques to minimize overhead while collecting Intel Advisor analysis data. The Disabling additional analysis technique also minimizes finalization overhead.

Minimization Technique

Impacted Intel Advisor Analyses

Summary

Change stackwalk mode from offline (after collection) to online (during collection)

Survey

GUI control: Project Properties > Analysis Target > Survey Hotspots Analysis > Advanced > Stack unwinding mode > During collection

CLI action option: -stackwalk-mode=online

Disable stacks collection

  • Roofline

  • Trip Counts and FLOP

GUI controls:

  • Vectorization Workflow pane > Enable Roofline with Callstacks checkbox

  • Project Properties > Analysis Target > Trip Counts and FLOP Analysis > Advanced > Collect stacks checkbox

CLI action option: -no-stacks (or just ensure the CLI action option -stacks is omitted from the command line)

Disable stitch stacks

Survey

GUI control: Project Properties > Analysis Target > Survey Hotspots Analysis > Advanced > Stitch stacks checkbox

CLI action option: -no-stack-stitching

Increase sampling interval

Survey

GUI control: Project Properties > Analysis Target > Survey Hotspots Analysis > Advanced > Sampling interval field

CLI action option: interval=<integer>

Limit collected analysis data

Survey

GUI control: Project Properties > Analysis Target > Survey Hotspots Analysis > Advanced > Collection data limit, MB field

CLI action option: -data-limit=<integer>

Limit loop call count

  • Dependencies

  • Memory Access Patterns

GUI control: Project Properties > Analysis Target > [Name] Analysis > Advanced > Loop Call Count Limit field

CLI action option: -loop-call-count-limit=<integer>

Disable Additional Analysis

Survey

GUI controls: Project Properties > Analysis Target > Survey Hotspots Analysis > Advanced...

  • Analyze MKL loops and functions checkbox

  • Analyze Python loops and functions checkbox

  • Analyze loops that reside in non-executed code paths checkbox

  • Enable register spill/fill analysis checkbox

  • Enable static instruction mix analysis checkbox

CLI action options:

  • -no-mkl-user-mode

  • -no-profile-python

  • -no-support-multi-isa-binaries

  • -no-spill-analysis

  • -no-static-instruction-mix

Change Stackwalk Mode from Offline (After collection) to Online (During Collection)

Minimize collection overhead.

Applicable analysis: Survey.

Set to offline/after collection when:

  • Survey analysis runtime overhead exceeds 1.1x.

  • A large quantity of data is allocated on the stack, which is a common case for Fortran applications or applications with a large number of small, parallel, OpenMP* regions

To implement, do one of the following before/while running a Survey analysis:

  • Set Project Properties > Analysis Target > Survey Hotspots Analysis > Advanced > Stack unwinding mode > During collection.

  • Use the CLI action option -stackwalk-mode=online. For example:

    advixe-cl -collect survey -project-dir ./myAdvisorProj -stackwalk-mode=online -- ./bin/myTargetApplication

Disable Stacks Collection

Minimize collection overhead.

Applicable analyses: Roofline, Trip Counts and FLOP.

To implement, do one of the following before/while running the analysis:

  • Disable the Vectorization Workflow pane > Enable Roofline with Callstacks checkbox.

  • Disable the Project Properties > Analysis Target > Trip Counts and FLOP Analysis > Advanced > Collect stacks checkbox.

  • Ensure the CLI action option -stacks is omitted from the command line. Alternative: Use the CLI action option -no-stacks.

Disable Stitch Stacks

Minimize collection overhead.

Applicable analysis: Survey.

The stitch stacks option restores a logical call tree for Intel® Threading Building Blocks (Intel® TBB) or OpenMP* applications by catching notifications from the runtime and attaching stacks to a point introducing a parallel workload.

Disable when Survey analysis runtime overhead exceeds 1.1x.

To implement, do one of the following before/while running the analysis:

  • Disable the Project Properties > Analysis Target > Survey Hotspots Analysis > Advanced > Stitch stacks checkbox.

  • Use the CLI action option -no-stack-stitching. For example:

    advixe-cl -collect survey -project-dir ./myAdvisorProj -no-stack-stitching -- ./bin/myTargetApplication

Note

Disabling stack stitching may decrease the overhead for applications using Intel® TBB.

Increase Sampling Interval

Minimize collection overhead.

Applicable analysis: Survey.

Increase the wait time between each analysis collection sample when your target application runtime is long.

To implement, do one of the following before/while running the analysis:

  • Increase the value in the Project Properties > Analysis Target > Survey Hotspots Analysis > Advanced > Sampling interval checkbox.

  • Use the CLI action option interval=<integer> when running a Survey analysis. For example:

    advixe-cl -collect survey -project-dir ./myAdvisorProj -interval=20 -- ./bin/myTargetApplication

Limit Collected Analysis Data

Minimize collection overhead.

Applicable analysis: Survey.

Decrease the amount of collected raw data when exceeding a size threshold could cause issues. For example: You have storage space limitations.

To implement, do one of the following before/while running the analysis:

  • Decrease the value in the Project Properties > Analysis Target > Survey Hotspots Analysis > Advanced > Collection data limit, MB field.

  • Decrease the value in the CLI action option -data-limit=<integer>. For example:

    advixe-cl -collect survey -project-dir ./myAdvisorProj -data-limit=250 -- ./bin/myTargetApplication

Limit Loop Call Count

Minimize collection overhead.

Applicable analysis: Dependencies, Memory Access Patterns.

Decrease the maximum number of instances each marked loop is analyzed.

To implement, do one of the following before/while running the analysis:

  • Supply a non-zero value in the Project Properties > Analysis Target > [Name] Analysis > Advanced > Loop Call Count Limit field.

  • Supply a non-zero value in the CLI action option -data-limit=<integer>. For example:

    advixe-cl -collect dependencies -project-dir ./myAdvisorProj -loop-call-count-limit=10 -- ./bin/myTargetApplication

Disable Additional Analysis

Minimize finalization overhead.

Applicable analysis: Survey.

Implement these techniques when the additional data is not important to you.

Note

The default setting for all the properties/options in the table below is disabled.

Path: Project Properties > Analysis Target > Survey Hotspots Analysis > Advanced

CLI Action Options

Description

Disable the Analyze MKL loops and functions checkbox.

-no-mkl-user-mode

Do not show Intel® Math Kernel Library (Intel® MKL) loops and functions in Intel Advisor reports.

Disable the Analyze Python loops and functions checkbox .

-no-profile-python

Do shot show Python* loops and functions in Intel Advisor reports.

Disable the Analyze loops that reside in non-executed code paths checkbox.

-no-support-multi-isa-binaries

Do not collect a variety of data for loops that reside in non-executed code paths, including:

  • Loop assembly code

  • Instruction set architecture (ISA)

  • Vector length

Note

This capability is available only for binaries compiled using the -ax (Linux* OS)/Qax (Windows* OS) option with an Intel® compiler.

Disable the Enable register spill/fill analysis checkbox.

-no-spill-analysis

Do not calculate the number of consecutive load/store operations in registers and related memory traffic.

Disable the Enable static instruction mix analysis checkbox .

-no-static-instruction-mix

Do not statically calculate the number of specific instructions present in the binary.

For more complete information about compiler optimizations, see our Optimization Notice.