User Guide

Contents

Examine Not-Vectorized and Under-Vectorized Loops

Accuracy Level

Low

Enabled Analyses

Survey

Result Interpretation

After running the
Vectorization and Code Insights
perspective with Low accuracy, you get a basic vectorization report, which shows not-vectorized and under-vectorized loops, and other performance issues.
In the Survey report:
  1. Sort by the
    Self-Time
    and/or
    Total-Time
    column to find top time-consuming loops.
  2. Check whether your target loop or function is vector or scalar.
    Intel Advisor
    helps you to differentiate vector and scalar using the following icons:
    • - vectorized function
    • - vectorized loop
    • - scalar function
    • - scalar loop
  3. Use filters to hide the code sides that you do not want to tweak now: and
  4. Decide what loops or functions to investigate:
    • If loop/function is scalar
    • If loop/function is vectorized

If Loop/Function is Scalar

If the target loop/function is scalar ( or ), you need to understand why the compiler did not vectorize the loop/function.
Several reasons are possible:
See OpenMP* Pragmas Summary in the
Intel® oneAPI
DPC++/C++
Compiler
Developer Guide and Reference for more information about the directives mentioned below.
Possible Reason
To Confirm
To Do
Assumed dependency
Refer to
Why No Vectorization?
column. Search for
Vector dependence prevents vectorization
issue.
  • If no dependencies are found, force vectorization with the omp simd directive or provide other vectorization recommendations to compiler.
  • If dependencies are confirmed, resolve them, or move to the next loop.
Function call in the loop
Refer to
Why No Vectorization?
column. Search for issues:
  • Function call present
  • Indirect function call present
  • Serialized user function call present
For issue:
Function call present
, do one of the following:
  • Inline function into the loop.
  • Vectorize the function with the omp declare simd directive.
For issues
Indirect function call present
or
Serialized user function call present
, refer to guidelines in the
Recommendations
tab.
Compiler-assumed inefficient vectorization
Refer to
Why No Vectorization?
column. Search for the
Loop vectorization possible but seems inefficient
issue.
Try forcing vectorization with the omp simd directive.
If forcing vectorization doesn't provide tangible results, consider experimenting with other directives.
To better understand performance implications and potential speed-up, consider running additional analyses:
  • Trip Counts
  • Memory Access Patterns
Other
Refer to
  • Why No Vectorization?
    column
  • Vector Issues
    column
Study the Compiler Diagnostic Details and Advisor Recommendations to resolve the issues.

If Loop/Function is Vectorized

If the target loop is vectorized ( or ), ensure vector efficiency is above 90%.
If efficiency is below 90%, consider the following:
Possible Reason
To Confirm
To Do
ISA
Refer to
Vectorized Loops/Vector ISA
column to check the ISA version used in the application.
Change the target ISA by specifying corresponding compiler flags.
Inefficient peel/remainder
Refer to
Vector Issues
column. Search for the
Inefficient Peel/Reminder
issue. Or check if the time spent in peel/reminder is significant.
Resolve the issues:
  • Check
    Recommendations
    tab.
  • Run the Trip Counts analysis.
Possible inefficient memory access
Refer to
Vector Issues
column. Search for the
Possible Inefficient Memory Access
issue.
Refer to
Instruction Set Analysis/Traits
column. Search for the following traits:
  • extracts
  • inserts
  • gather
  • scatter
Run the Memory Access Patterns analysis.
Type conversions present
Refer to
Instruction Set Analysis/Traits
column. Search for the
Type Conversions
metric.
Remove redundant type conversions from float to double that might lead to smaller vector length and reduced vectorization efficiency.
Unaligned vector access in loop
Refer to
Advanced/Vectorization Details
column. Search for the
Unaligned access in vector loop
metric.
Align data.
Register pressure
Refer to
Vector Issues
column. Search for the
Vector register spilling possible
issue.
Resolve the issue by doing one of the following:
  • Decrease loop unroll factor.
  • Split the loop into smaller parts.
Potential underutilization of FMA instructions
Refer to
Vector Issues
column. Search for the
Potential underutilization of FMA instructions
issue.
Resolve the issue by doing one of the following:
  • Change the target ISA.
  • Explicitly enable FMA generation and vectorization.
Other
Refer to
Vector Issues
column.
Follow the
Intel Advisor
recommendations to resolve the issues.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.