User Guide

Contents

Next Steps: After Running Survey Analysis

After you run a Survey analysis:
  1. Sort by the
    Self-Time
    and/or
    Total-Time
    column to find top time-consuming loops.
  2. Check whether your target loop or function is vector or scalar.
    Intel Advisor
    helps you to differentiate vector and scalar via the following icons:
    • - vectorized function
    • - vectorized loop
    • - scalar function
    • - scalar loop
  3. Use filters to hide the code sides that you don't want to tweak now: and
  4. Decide what loops or functions to investigate:

If Loop/Function is Scalar

If the target loop/function is scalar ( or ), you need to understand why the compiler did not vectorize the loop/function.
Several reasons are possible:
See
OpenMP* Pragmas Summary
in the Intel compiler
Developer Guide and Reference
for more information on the directives mentioned below.
Possible Reason
To Confirm
To Do
Assumed dependency
Refer to
Why No Vectorization?
column. Search for
Vector dependence prevents vectorization
issue.
  • If no dependencies are found, force vectorization with the omp simd directive or provide other vectorization recommendations to compiler.
  • If dependencies are confirmed, resolve them, or move to the next loop.
Function call in the loop
Refer to
Why No Vectorization?
column. Search for issues:
  • Function call present
  • Indirect function call present
  • Serialized user function call present
For issue:
Function call present
, do one of the following:
  • Inline function into the loop.
  • Vectorize the function with the omp declare simd directive.
For issues
Indirect function call present
or
Serialized user function call present
, refer to guidelines in the
Recommendations
tab.
Compiler-assumed inefficient vectorization
Refer to
Why No Vectorization?
column. Search for the
Loop vectorization possible but seems inefficient
issue.
Try forcing vectorization with the omp simd directive.
If forcing vectorization doesn't provide tangible results, consider experimenting with other directives.
To better understand performance implications and potential speed-up, consider running additional analyses:
  • Trip Counts
  • Memory Access Patterns
Other
Refer to
  • Why No Vectorization?
    column
  • Vector Issues
    column
Study the Compiler Diagnostic Details and Advisor Recommendations to resolve the issues.

If Loop/Function is Vectorized

If the target loop is vectorized ( or ), ensure vector efficiency is above 90%.
If efficiency is below 90%, consider the following:
Possible Reason
To Confirm
To Do
ISA
Refer to
Vectorized Loops/Vector ISA
column to check the ISA version used in the application.
Change the target ISA by specifying corresponding compiler flags.
Inefficient peel/remainder
Refer to
Vector Issues
column. Search for the
Inefficient Peel/Reminder
issue. Or check if the time spent in peel/reminder is significant.
Resolve the issues:
  • Check
    Recommendations
    tab.
  • Run the Trip Counts analysis.
Possible inefficient memory access
Refer to
Vector Issues
column. Search for the
Possible Inefficient Memory Access
issue.
Refer to
Instruction Set Analysis/Traits
column. Search for the following traits:
  • extracts
  • inserts
  • gather
  • scatter
Run the Memory Access Patterns analysis.
Type conversions present
Refer to
Instruction Set Analysis/Traits
column. Search for the
Type Conversions
metric.
Remove redundant type conversions from float to double that might lead to smaller vector length and reduced vectorization efficiency.
Unaligned vector access in loop
Refer to
Advanced/Vectorization Details
column. Search for the
Unaligned access in vector loop
metric.
Align data.
Register pressure
Refer to
Vector Issues
column. Search for the
Vector register spilling possible
issue.
Resolve the issue by doing one of the following:
  • Decrease loop unroll factor.
  • Split the loop into smaller parts.
Potential underutilization of FMA instructions
Refer to
Vector Issues
column. Search for the
Potential underutilization of FMA instructions
issue.
Resolve the issue by doing one of the following:
  • Change the target ISA.
  • Explicitly enable FMA generation and vectorization.
Other
Refer to
Vector Issues
column.
Follow the
Intel Advisor
recommendations to resolve the issues.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804