Issue

Running your target application with the Intel Advisor can take substantially longer than running your target application without the Intel Advisor. For example:

Runtime Overhead / Analysis

Survey

Trip Counts & FLOP

Roofline

Dependencies

MAP

Target application runtime with Intel Advisor compared to runtime without Intel Advisor

1.1x longer

3 - 8x longer

3.1 - 8.1x longer

5 - 100x longer

5 - 20x longer

Solutions

The following techniques may help minimize overhead without limiting collection scope.

Minimization Technique

Impacted Intel Advisor Analyses

Summary

Disable cache simulation

  • Trip Counts and FLOP

  • Memory Access Patterns

GUI controls:

  • Project Properties > Analysis Target > Memory Access Patterns Analysis > Advanced > Enable cache simulation checkbox

  • Project Properties > Analysis Target > Trip Counts and FLOP Analysis > Advanced > Enable cache simulation checkbox

CLI action option: --no-enable-cache-simulation

Limit reported data

Memory Access Patterns

GUI controls:

  • Project Properties > Analysis Target > Memory Access Patterns Analysis > Advanced > Report stack variables checkbox

  • Project Properties > Analysis Target > Memory Access Patterns Analysis > Advanced > Report heap allocated variables checkbox

CLI action options:

  • --no-record-stack-frame

  • --no-record-mem-allocations

Minimize data set

All, but especially

  • Dependencies

  • Memory Access Patterns

Minimize number of instructions executed within a loop while thoroughly exercising target application control flow paths.

Temporarily disable finalization

  • Roofline

  • Survey

  • Trip Counts & FLOP

GUI control: Vectorization Workflow pane > Cancel current analysis control during finalization

CLI action option: --no-auto-finalize

Disable Cache Simulation

Minimize collection overhead.

Applicable analyses:

  • Memory Access Patterns (base simulation functionality)

  • Trip Counts and FLOP (enhanced simulation functionality that also requires setting the ADVIXE-EXPERIMENTAL=int_roofline environment variable)

Implement these techniques when cache modeling information is not important to you:

Note

The default setting for all the properties/options in the table below is disabled.

Path: Project Properties > Analysis Target...

CLI Action Options

Description

Disable the Memory Access Patterns Analysis > Advanced > Enable cache simulation checkbox.

--no-enable-cache-simulation

Do not model cache misses, cache misses and cache line utilization, or cache misses and loop footprint.

Disable the Trip Counts and FLOP Analysis > Advanced > Enable cache simulation checkbox.

--no-enable-cache-simulation

Do not:

  • Model multiple levels of cache for data, such as counts of loaded of stored bytes for each loop.

  • Create simulations for specific cache hierarchy configurations.

Limit Reported Data

Applicable analysis: Memory Access Patterns.

Implement these techniques when the additional data is not important to you.

Note

The default setting for all the properties/options in the table below is enabled.

Project Properties > Analysis Target > Memory Access Patterns Analysis > Advanced

CLI Action Options

Description

Disable the Report stack variables checkbox.

--no-record-stack-frame

Do not report stack variables for which memory access strides are detected.

Disable the Report heap allocated variables checkbox.

--no-record-mem-allocations

Do not report heap-allocated variables for which memory access strides are detected.

Minimize Data Set

Minimize collection overhead.

Applicable analyses: All, but especially Dependencies, Memory Access Patterns.

When you run an analysis, the Intel Advisor executes the target against the supplied data set. Data set size and workload have a direct impact on target application execution time and analysis speed

For example, it takes longer to process a 1000x1000 pixel image than a 100x100 pixel image. A possible reason: You may have loops with an iteration space of 1...1000 for the larger image, but only 1...100 for the smaller image. The exact same code paths may be executed in both cases. The difference is the number of times these code paths are repeated.

You can control analysis cost without sacrificing completeness by minimizing this kind of unnecessary repetition from target application execution.

Instead of choosing large, repetitive data sets, choose small, representative data sets that minimize the number of instructions executed within a loop while thoroughly exercising target application control flow paths.

Your objective: In as short a runtime period as possible, execute as many paths as you can afford, while minimizing the repetitive computation within each task to the bare minimum needed for good code coverage.

Data sets that run in about ten seconds or less are ideal. You can always create additional data sets to ensure all your code is checked.

Temporarily Disable Finalization

Minimize finalization overhead.

Applicable analyses: Roofline, Survey, Trip Counts and FLOP.

Use when you plan to view collected analysis data on a different machine. This is particularly useful if you are collecting analysis data on an Intel® Xeon Phi™ machine and plan to view the result on another machine. Finalization automatically occurs when a result is opened in the GUI or a report is generated from the result.

To implement, do one of the following while running an analysis:

  • When the analysis Finalizing data... phase begins, click the associated Cancel button.
    Intel Advisor control: Finalize Cancel button

  • Use the CLI action option -no-auto-finalize when you run the desired analysis. For example:

    advixe-cl --collect=survey --project-dir=./myAdvisorProj --no-auto-finalize -- ./bin/myTargetApplication
Для получения подробной информации о возможностях оптимизации компилятора обратитесь к нашему Уведомлению об оптимизации.