Vectorization Advisor FAQ

General questions

What is Vectorization Advisor?

Vectorization Advisor is one of the two major features of the Intel® Advisor XE 2016 product. Intel® Advisor XE comprises Vectorization Advisor and Threading Advisor.

Vectorization Advisor is an analysis tool that lets you identify if loops utilize modern SIMD instructions or not, what prevents vectorization, what is performance efficiency and how to increase it. Vectorization Advisor shows compiler optimization reports in user-friendly way, and extends them with multiple other metrics, like loop trip counts, CPU time, memory access patterns and recommendations for optimization.

Where can I download Vectorization Advisor?
Intel® Advisor XE 2016 with Vectorization Advisor is available as part of Intel® Parallel XE 2016 suite. Visit the product web site for more information. See for information on evaluation copies or purchasing.
What is the difference between “Threading Advisor” and “Vectorization Advisor”?

Intel® Advisor XE version 2015 and earlier had only Threading Advisor workflow. Read more on the product website.

Starting from Intel® Advisor XE 2016, the product includes two major workflows or feature sets:

  • Vectorization Advisor is a vectorization analysis tool that lets you identify loops that will benefit most from vectorization, identify what is blocking effective vectorization, explore the benefit of alternative data reorganizations, and increase the confidence that vectorization is safe.
  • Threading Advisor is a threading design and prototyping tool that lets you analyze, design, tune, and check threading design options without disrupting your normal development.
What Compilers and programming languages are supported?

Vectorization Advisor supports C/C++ and Fortran programming languages.  

Vectorization Advisor requires Intel Compiler 15.0 or later to collect full set of analysis data. However, a subset of metrics is available for binaries built with GCC* or Microsoft* compiler.

How do I get support and provide feedback?

Visit our product support page.

Vectorization analysis workflow

Where do I start?
Check prerequisites and build settings in Getting Started with Intel® Advisor XE 2016. Create a project – just specify the executable to analyze and command line parameters.

 

Start from running Survey analysis – it will give you main statistics about vectorized and scalar loops:

First things to look at the Survey Report:

  1. Self and Total CPU time – focus on the most time consuming loops. Use Top Down tab (at the blue area in the middle of Advisor window) to explore call tree.
  2. Find hot scalar loops in Loop Type column. “Why No Vectorization” column and loop summary explain the reason that prevented compiler from generating SIMD code.
  3. For vectorized loops (marked by ), expand “Vectorized Loops” and other columns in the grid. Check efficiency metrics, instruction set, vector length and Traits.
  4. Click on a “Lamp” with a digit – it will bring you to Recommendations tab on the bottom, that might contain optimization hints.
Can I use Vectorization Advisor from a command line?

Yes. Use “advixe-cl --help” command to learn about syntax and see some examples. Please be aware that Intel Advisor XE 2016 documentation for command line syntax may not be up to date, and not all CLI options may be covered. We’re working on addressing this gap.

Hint: use “Command Line” link on workflow to generate command line for selected analysis type and project settings:

Does Vectorization Advisor help in improving already vectorized codes?

Yes, Vectorization Advisor has multiple features to detect inefficient usage of SIMD instructions. Some typical examples:

  • Efficiency metric is significantly lower than ideal value
  • Using instruction set lower than supported by hardware (e.g. SSE2 on a machine supporting AVX)
  • Vectorization traits detection, e.g. using gather and scatter instructions
  • Non-uniform and unaligned data accesses (use Memory Access Patterns analysis)
  • Partial loop vectorization, when scalar peel or remainder takes noticeable CPU time
  • Other bottlenecks described in Recommendations tab
Can I run Vectorization Advisor with an MPI application?

Yes. Use command line syntax for analyzing MPI applications, see details and examples. Below is an example with mpirun and “-gtool” option. This command launches “./your_app” application on 4 ranks, and only ranks 2 and 3 are analyzed by Intel Advisor:

mpirun -n 4 -gtool "advixe-cl -collect survey:2,3" ./your_app
How do I explore results on a cluster node without a GUI?

You can perform an MPI analysis only through the Intel Advisor command line interface; however, there are several ways to view an Intel Advisor result:

  • If you have an Intel Advisor GUI in your cluster environment, open a result in the GUI. E.g. a login node may have X server configured, and you can use a shared directory for storing Intel Advisor project.
  • If you do not have an Intel Advisor GUI on your cluster node, copy the result directory to another machine with the Intel Advisor GUI and open the result there. You can use a Windows machine to browse results collected on Linux. In this case, you might need to configure search directories in project properties to locate source files.
  • Use the Intel Advisor command line reports to browse results on a cluster node. E.g. default survey report:

    advixe-cl -report survey –project-dir ./my_proj

What data will I get with an application built with GCC* or Microsoft* compilers?

Vectorization Advisor requires Intel Compiler to collect full set of analysis data. However, a subset of metrics is available for binaries built with GCC or Microsoft compiler:

  • CPU time and call tree (Top Down tab)
  • Vector Instruction Set, Vector length, Data types
  • Loop trip counts
  • Dependencies analysis (loop dependencies)
  • Memory Access Patterns analysis
Do I need source code annotations?

No. Vectorization Advisor does not require source code modification. You can select loops for analysis using checkboxes on Survey tab:

Source code annotations are needed for Threading Advisor only.

How do I specify which loops to analyze by Memory Access Patterns or Dependencies features? How do I do it command line and in GUI?

In GUI, you can select loops for analysis using checkboxes on Survey tab:

In command line, print survey report and notice column “ID” before each loop:

advixe-cl -report survey –project-dir ./my_proj
ID Function Call Sites and Loops Self Time Total Time
 69 -[loop at test.cpp:190 ...] 1.06054 1.06054
 83 -[loop at test.cpp:89 ...] 0.841134 0.841134
 51 -[loop at test.cpp:113 ...] 0.799016 0.799016

Then use “-mark-up-list” option to specify loop IDs for Dependencies or Memory Access Patterns analysis:

advixe-cl -collect map -mark-up-list=83,51 -project-dir ./my_proj -– my_application

Tip: open the result in GUI, select loops using checkboxes and press “Get Command Line” button. It will generate command line for Dependencies or Memory Access Patterns analysis automatically.

How can I decrease analysis time?

Survey analysis in Vectorization Advisor is the least intrusive and should not slow down application significantly. However, analyses like “Dependencies” and “Memory Access Patterns” have significant overhead. You can mitigate application slowdown in several ways:

  1. Decrease a workload. It depends on your application how to do it: provide smaller data to process, decrease complexity of computations.
  2. Use separate settings for Survey and other analysis types. By default, it’s enough to configure Survey settings only, but if you can control workload via command line parameters, you can keep separate command line settings for different analysis types:
  3. Decrease number of selected loops for Dependencies or Memory Access Patterns analysis.
  4. Look at the Refinement report tab while the analysis runs. Data is shown once it appears, you don’t have to wait until application finishes. Press “Stop” button “in advance” when you see that analysis for all loops of interest is already finished (in Memory Access Patterns or Dependencies view).

Understanding Vectorization Advisor results

What kind of data does Vectorization Advisor provide? How does it collect information?

Key Vectorization Advisor features include a:

  • Correlation of CPU time and vectorization metrics with compiler optimization and vectorization reports
  • Ability to explore relevant loop data all in one place, including CPU time, if loop is vectorized, compiler diagnostics about vectorization constraints, instruction set, source code, and assembly code
  • Dependencies analysis that checks for loop-carried dependencies, so you can decide it if is safe to force vectorization with pragmas
  • Memory Access Patterns analysis that identifies non-unit stride array element accesses. Non-unit stride memory accesses can prevent automatic vectorization or hurt performance.
  • Loop Trip counts and call counts
  • Recommendations based on the static and dynamic analysis data.

Tip: Workflow panel helps to navigate through the steps and analysis types.

 
What data do I get from Survey analysis?

Most statistics is gathered by Survey analysis. It combines dynamic analysis (CPU sampling), static binary analysis (instruction set, data types, etc.) and compiler diagnostics. All analysis types include binary instrumentation and dynamic analysis; it means that Intel Advisor has to execute an application, even if collecting some data doesn’t require actual running.

The Survey Report provides a wealth of information, including the following:

  • Vectorized loop parts, such Body, Peeled, and Remainder, which are automatically grouped as a hierarchical row in top table.
  • Why No Vectorization? – Why a loop was not vectorized
  • Vectorized loops columns:
    • Vector Instruction Set –  For example, SSE, SSE2, and AVX
    • Efficiency – available with Intel Compiler 16.0 and later
    • Gain – Advisor estimate of relative loop performance speed-up achieved due to vectorization
    • Vector length – number of data elements of the given type fitting in a SIMD lane
  • Instruction set analysis (compiler-independent SIMD statistics):
    • Traits – Important loop characteristics, potentially hurting performance. For example, Divisions, Shuffles, and Masked Stores
    • Data Types
  • Optimization info columns:
    • Transformations – How a loop was modified if it was modified by the compiler (for example, loop unrolling)
    • Unroll factor
    • Estimated Achieved Gain - theoretical estimate of achievable or achieved vectorization gain, provided directly by compiler
    • Vector width, Vectorization Details and Optimization Details.
  • Tabs on the bottom of Survey report:
    • Top Down – call tree of loops and functions
    • Source and Assembly views with embedded optimization and vectorization info
    • Recommendations – description of typical problems and tips to optimize
    • Compiler Diagnostic Details – detailed description of compiler diagnostics with examples
What data do I get from Trip Counts analysis?

Trip Counts analysis counts minimum, maximum, median trip counts (i.e. number of times loop body was executed) and call counts (number of times loop is invoked) for all the loops in the application. Therefore, you should to run Survey first, then Trip Counts analysis. NOTE! Do not re-build your binary between running Survey and Trip counts, it can produce wrong results. Trip Counts results are added to existing Survey report in a new column group:

What data do I get from Dependencies analysis?

Dependencies analysis checks for cross-iteration (“loop carried”) dependencies. The most common case to use it is when you see “assumed dependence prevents vectorization” message in “Why No Vectorization” column. If Dependencies analysis reports no dependencies, you are safe to force vectorization. If dependencies are detected, you will get detailed information where they are:

NOTE! Dependencies analysis is only applicable to scalar (not vectorized) loops.

What data do I get from Memory Access Patterns analysis?

Memory Access Patterns (MAP) analysis traces memory access instructions and detects patterns: unit stride, non-unit “constant” stride (like on the picture below) and non-unit variable stride (gather-scatter patterns). Operand size and non-aligned data accesses are also reported.

Example results of MAP analysis in source view:

Run Memory Access Patterns (MAP) analysis in the following cases:

  • You addressed other vectorization problems, but the performance of the vectorized loop is still not satisfactory, while “Traits” indicate presence of Shuffles, Inserts, Gathers.
  • You want to eliminate non-unit stride memory accesses to refactor the code, either for optimizing vectorization or memory and cache usage.
How do I save results?

By default, Intel Advisor stores only the most recent result. That means if you run Survey (or any another analysis) two times, you will see only the last one without an option to get back to initial experiment.

You can manually save Intel Advisor experiments using “Snapshot” button  in Result window or on product toolbar:

This will save all analyses results (Survey, Trip Counts, Dependencies and MAP) in read-only experiment folder. You will be able to browse it any time, further experiments will not overwrite it. You can access the historical snapshots using Project Navigator.

How are Survey, Trip Counts and Dependencies results correlated?

Intel Advisor has complex structure of result versions. There are four analysis types: Survey, Trip Counts, Dependencies and Memory Access Patterns. All the results are comprised in “experiment” folder, usually called “e000”. The experiment contains the most recent versions of each result type. By default, only one (latest) experiment version is stored, however you can create “snapshots” – historical copies of the current experiment for future analysis and comparison purposes.

Basic analysis type is Survey. All other analysis types depend on Survey results, but don’t depend on each other:

e000:
Survey <- Trip Counts
Survey <- Dependencies
Survey <- MAP

Different analysis types are matched by an address in the target application binary. That means, when you select loops in Survey for further Dependencies analysis, they are identified by the address in binary. Changing the binary (re-building) between running Survey and Dependencies will break this connection and results will be wrong. Same applies to MAP and Trip Counts analyses. So if a binary is changed, run Survey again before running other analysis types.

You may run Survey 5 times, and only 1 time run Dependencies (say for Survey result #2). In this case, recent Survey will not match the Dependencies report, they can apply to different binary versions. If it is important to keep them matched, make a Snapshot before updating binary and running further analyses.

For more complete information about compiler optimizations, see our Optimization Notice.