Vectorization Advisor

Intel® Advisor offers a vectorization optimization tool and a threading design and prototyping tool to help ensure your Fortran, C and C++ applications realize full performance potential on modern processors, such as Intel® Xeon® and Intel® Xeon Phi™ processors. This topic discusses getting started with the Vectorization Advisor GUI.

Click to jump to Prerequisites | Workflow Quick Start | Navigation Quick Start | Follow the Workflow | Troubleshooting FAQ

Prerequisites

Build Application

To build applications that produce the most accurate and complete Vectorization Advisor analysis results, build an optimized binary of your application in release mode using these settings:

To Do This

Optimal C/C++ Settings

Request full debug information (compiler and linker).

Linux* OS command line: -g

Windows* OS command line:

  • /ZI

  • /DEBUG

Microsoft Visual Studio* IDE:

  • C/C++ > General > Debug Information Format > Program Database (/Zi)

  • Linker > Debugging > Generate Debug Info > Yes (/DEBUG)

Request moderate optimization.

Linux* OS command line: -O2 or higher

Windows* OS command line: /O2 or higher

Visual Studio* IDE: C/C++ > Optimization > Optimization > Maximize Speed (/O2) or higher

Produce compiler diagnostics (necessary for version 15.0 of the Intel compiler; unnecessary for version 16.0 and higher).

Linux* OS command line: -qopt-report=5

Windows* OS command line: /Qopt-report:5

Visual Studio* IDE: C/C++ > Diagnostics [Intel C++] > Optimization Diagnostic Level > Level 5 (/Qopt-report:5)

Enable vectorization

Linux* OS command line: -vec

Windows* OS command line: /Qvec

Enable SIMD directives

Linux command line: -simd

Windows* OS command line: /Qsimd

Enable generation of multi-threaded code based on OpenMP* directives.

Linux* OS command line: -qopenmp

Windows* OS command line: /Qopenmp

Visual Studio* IDE: C/C++ > Language [Intel C++] > OpenMP Support > Generate Parallel Code (/Qopenmp)

To Do This

Optimal Fortran Settings

Request full debug information (compiler and linker).

Linux* OS command line: -g

Windows* OS command line:

  • /debug=full

  • /DEBUG

Visual Studio* IDE:

  • Fortran > General > Debug Information Format > Full (/debug=full)

  • Linker > Debugging > Generate Debug Info > Yes (/DEBUG)

Request moderate optimization.

Linux* OS command line: -O2 or higher

Windows* OS command line: /O2 or higher

Visual Studio* IDE: Fortran > Optimization > Optimization > Maximize Speed or higher

Produce compiler diagnostics (necessary for version 15.0 of the Intel compiler; unnecessary for version 16.0 and higher).

Linux* OS command line: -qopt-report=5

Windows* OS command line: /Qopt-report:5

Visual Studio* IDE: Fortran > Diagnostics > Optimization Diagnostic Level > Level 5 (/Qopt-report:5)

Enable vectorization

Linux* OS command line: -vec

Windows* OS command line: /Qvec

Enable SIMD directives

Linux* OS command line: -simd

Windows* OS command line: /Qsimd

Enable generation of multi-threaded code based on OpenMP* directives.

Linux* OS command line: -qopenmp

Visual Studio* IDE: Fortran > Language > Process OpenMP Directives > Generate Parallel Code (/Qopenmp)

Set Up Environment

Do one of the following to set up your Linux* OS environment.

  • Run one of the following source commands:

    • For csh/tcsh users: source <advisor-install-dir>/advixe-vars.csh

    • For bash users: source <advisor-install-dir>/advixe-vars.sh

    The default installation path, <advisor-install-dir>, is below:

    • /opt/intel/ for root users

    • $HOME/intel/ for non-root users

  • Add <advisor-install-dir>/bin32 or <advisor-install-dir>/bin64 to your path.

  • Run the <parallel-studio-install-dir>/psxevars.csh or <parallel-studio-install-dir>/psxevars.sh command. The default installation path, <parallel-studio-install-dir>, is below:

    • /opt/intel/ for root users

    • $HOME/intel/ for non-root users

Note

  • Setting up the Windows* OS environment is necessary only if you use the advixe-gui command to launch the Intel Advisor standalone GUI (or the advixe-cl command to run the command line interface). To set up your Intel Advisor environment, run the <advisor-install-dir>\advixe-vars.bat or the <parallel-studio-install-dir>\psxevars.bat command.

  • The default installation path for both <advisor-install-dir> and <parallel-studio-install-dir> is below C:\Program Files (x86)\IntelSWTools\ (on certain systems, instead of Program Files (x86), the directory name is Program Files ).

In addition:

  • Verify your application runs before trying to analyze it with the Intel Advisor.

  • Make sure you run the Intel Advisor in the same Linux* OS environment as your application.

Workflow Quick Start

Follow these steps (white blocks are optional) to get started using the Vectorization Advisor in the Intel Advisor.
Vectorization Advisor workflow: Dig Deeper

  • Survey Report - Offers integrated compiler report data and performance data all in one place. Use the Survey Report to help identify:

    • Where vectorization will pay off the most

    • If vectorized loops are providing benefit, and if not, why not

    • Un-vectorized loops and why they are not vectorized

    • Performance problems

    The Survey Report also provides code-specific recommendations for how to fix vectorization issues, and quick visibility into source code and assembly code.

  • Trip Counts and FLOPs analysis - Dynamically identifies the number of times loops are invoked and execute (sometimes called call count/loop count and iteration count respectively); and provides data about floating-point operations, memory traffic, and AVX-512 mask usage. Use this added information in the Survey Report to make better decisions about your vectorization strategy for particular loops, as well as optimize already-parallel loops.

  • Roofline chart - Helps visualize actual performance against hardware-imposed performance ceilings, as well as determine the main limiting factor (memory bandwidth or compute capacity), thereby providing an ideal roadmap of potential optimization steps.

    Use the Roofline chart to answer the following questions:

    • What is the maximum achievable performance with your current hardware resources?

    • Does your application work optimally on current hardware resources?

    • If not, what are the best candidates for optimization?

    • Is memory bandwidth or compute capacity limiting performance for each optimization candidate?

  • Dependencies Report - For safety purposes, the compiler is often conservative when assuming data dependencies. Use a Dependencies-focused Refinement Report to check for real data dependencies in loops the compiler did not vectorize because of assumed dependencies. If real dependencies are detected, the analysis can provide additional details to help resolve the dependencies. Your objective: Identify and better characterize real data dependencies that could make forced vectorization unsafe.

  • Memory Access Patterns (MAP) Report - Use a MAP-focused Refinement Report to check for various memory issues, such as non-contiguous memory accesses and unit stride vs. non-unit stride accesses. Your objective: Eliminate issues that could lead to significant vector code execution slowdown or block automatic vectorization by the compiler.

Follow the Workflow

Launch the Intel Advisor

For the Linux* standalone GUI: Run the advixe-gui command.

For the Windows* standalone GUI: Do one of the following:

  • Run the advixe-gui command.

  • From the Microsoft Windows* 7 Start menu, select Intel Parallel Studio XE 201n > Analyzers > Advisor.

  • From the Microsoft Windows* 8/8.1/10 All Apps screen, select Intel Parallel Studio XE 201n > Intel Advisor 201n.

For the Visual Studio* IDE: Open your solution in the Visual Studio* IDE.

Manage Project

For the standalone GUI:

  1. Choose File > New > Project… (or click New Project… in the Welcome page) to open the Create a Project dialog box.

  2. Supply a name and location for your project, then click the Create Project button to open the Project Properties dialog box.

  3. On the left side of the Analysis Target tab, ensure the Survey Hotspots Analysis type is selected.

  4. Set the appropriate parameters. (Setting the binary/symbol search and source search directories is optional for the Vectorization Advisor.)

  5. Click the OK button to close the Project Properties dialog box.

For the Visual Studio* IDE:

  1. Choose Project > Intel Advisor version Project Properties… to open the Project Properties dialog box.
  2. On the left side of the Analysis Target tab, ensure the Survey Hotspots Analysis type is selected.

  3. Set the appropriate parameters. (Setting the binary/symbol search and source search directories is optional for the Vectorization Advisor.)

  4. Click the OK button to close the Project Properties dialog box.

Tip

  • If you plan to run other vectorization analysis types, set parameters for them now, if possible.

  • If possible, use the Inherit settings from Survey Hotspots Analysis Type checkbox for other analysis types.

  • The Survey Trip Counts Analysis type has similar parameters to the Survey Hotspots Analysis type.

  • The Dependencies Analysis and Memory Access Patterns Analysis types consume more resources than the Survey Hotspots Analysis type. If these Refinement analyses take too long, consider decreasing the workload.

  • Select Track stack variables in the Dependencies Analysis type to detect all possible dependencies.

  • When necessary, click the tab at the top of the Workflow pane to switch between the Vectorization Workflow and Threading Workflow.

Run Survey Analysis

Under Survey Target in the Vectorization Workflow, click the Intel Advisor control: Run analysis control to collect Survey data while your application executes.

Note

If the Workflow is not displayed in the Visual Studio IDE: Click the Intel Advisor toolbar icon icon on the Intel Advisor toolbar.

After the Intel Advisor collects the data, it displays a Survey Report similar to the following:


Intel Advisor: Survey Report
There are many controls available to help you focus on the data most important to you, including the following:

1

Click the button to save a read-only result snapshot you can view any time.

Intel Advisor stores only the most recent analysis result. Visually comparing one or more snapshots to each other or to the most recent analysis result can be an effective way to judge performance improvement progress.

To open a snapshot, choose File > Open > Result...

2

Click the button to restore default filters.

3

Click the various Filter buttons and drop-down lists to temporarily limit displayed data based on your criteria.

4

Click the button to view loops in non-executed code paths for various instruction set architectures (ISA). Prerequisites:

  • Compile the target application for multiple code paths using the Intel compiler.

  • Enable the Analyze loops in not executed code path checkbox in Project Properties > Analysis Target > Survey Hotspots Analysis.

5

Click the toggle to simplify data representation and automatically select suitable and/or high-impact loops from a SIMD vector performance perspective.

Smart mode uses loop call tree nesting (Loop Height column), fraction of Total CPU Time (which you can adjust using the Loops Above control), and other criteria to automatically filter and sort loops of interest.

6

Click the button to search for specific data.

7

Click the tab to open various Intel Advisor reports or views.

8

Click the toggle to show/hide sets of columns.

9

Click the control to show/hide a chart that helps you visualize actual performance against hardware-imposed performance ceilings, as well as determine the main limiting factor (memory bandwidth or compute capacity), thereby providing an ideal roadmap of potential optimization steps.

10

Click a data row in the top of the Survey Report to display more data specific to that row in the bottom of the Survey Report. Double-click a loop data row to display a Survey Source window.

11

Click a checkbox to mark a loop for deeper analysis.

12

If present, click the image to display code-specific how-can-I-fix-this-issue? information in the Recommendations pane.

13

If present, click the image to view the reason automatic vectorization failed in the Why No Vectorization? pane.

14

Click the control to show/hide the Workflow pane.

Run Trip Counts and/or FLOPs Analysis

This step is optional.

Before running a Trip Counts and/or FLOPs analysis, make sure you set the appropriate Project Properties for the Survey Trip Counts Analysis type. (Use the same application, but a smaller input data set if possible.)

Under Find Trip Counts and FLOPS in the Vectorization Workflow, select the Trip Counts and/or FLOPS checkbox; then click the Intel Advisor control: Run analysis control to collect Trip Counts and/or FLOPS data while your application executes.

After the Intel Advisor collects the data, it adds a Trip Counts and/or FLOPS column set to the Survey Report. Expand the column set(s) to see all available data.

Tip

Use Trip Counts data to:

  • Detect loops with too-small trip counts and trip counts that are not a multiple of vector length.

  • Analyze parallelism granularity more deeply.

Run Roofline Analysis

This step is optional.

Before running a Roofline analysis, make sure you set the appropriate Project Properties for the Survey Hotspots Analysis and Survey Trip Counts Analysis types.

Under Roofline in the Vectorization Workflow, click the Intel Advisor control: Run analysis control to execute your target application twice to:

  • Measure the hardware limitations of your machine and collect loop/function timings using the Survey analysis.

  • Collect FLOPs data using the Trip Counts and FLOPS analysis - this collection can take three to four times longer than the Survey analysis.

Upon completion, the Intel Advisor displays a Roofline chart on the left side of the Survey Report.

The Roofline chart plots an application's achieved floating-point performance and arithmetic intensity against the machine's maximum achievable performance:

  • Arithmetic intensity (x axis) - measured in number of floating-point operations (FLOPs) per byte, based on the loop/function algorithm, transferred between CPU/VPU and memory

  • Floating-point performance (y axis) - measured in billions of floating-point operations per second (GFLOPS)

In general:

  • The size and color of each Roofline chart dot represent relative execution time for each loop/function. Large red dots take the most time, so are the best candidates for optimization. Small green dots take less time, so may not be worth optimizing.

  • Roofline chart diagonal lines indicate memory bandwidth limitations preventing loops/functions from achieving better performance without some form of optimization. For example: The L1 Bandwidth roofline represents the maximum amount of work that can get done at a given arithmetic intensity if the loop always hits L1 cache. A loop does not benefit from L1 cache speed if a dataset causes it to miss L1 cache too often, and instead is subject to the limitations of the lower-speed L2 cache it is hitting. So the dot representing the loop is positioned somewhere below the L2 Bandwidth roofline.

  • Roofline chart horizontal lines indicate compute capacity limitations preventing loops/functions from achieving better performance without some form of optimization. For example: The Scalar Add Peak represents the peak number of add instructions that can be performed by the scalar loop under these circumstances. The Vector Add Peak represents the peak number of add instructions that can be performed by the vectorized loop under these circumstances. If a loop is not vectorized, the dot representing the loop is positioned somewhere below the Scalar Add Peak roofline.

  • A dot cannot exceed the topmost rooflines, as these represent the maximum capabilities of the machine; however, not all loops can utilize maximum machine capabilities.

  • The greater the distance between a dot and the highest achievable roofline, the more opportunity exists for performance improvement.

In the following Roofline chart representation, loops A and G (large red dots), and to a lesser extent B (yellow dot far below the roofs), are the best candidates for optimization. Loops C, D, and E (small green dots) and H (yellow dot) are poor candidates because they do not have much room to improve.
This is a visual model, not an actual screenshot, of the Roofline Chart

There are several controls to help you show/hide the Roofline chart:

Intel Advisor: Roofline Chart & Survey Report

1

Click to toggle between Roofline chart view and Survey Report view.

2

Click to toggle to and from side-by-side Roofline chart and Survey Report view.

3

Drag to adjust the dimensions of the Roofline chart and Survey Report.

There are several controls to help you focus on the data most important to you, including the following:
Intel Advisor: Roofline controls

1

  • Select multiple loops by tracing a rectangle with your mouse. To select a single loop, simply click the dot representing the loop.

  • Zoom in and out by tracing a rectangle with your mouse. You can also zoom in and out using your mouse wheel.

  • Move the chart left, right, up, and down.

  • Undo or redo the previous zoom action.

  • Reset to the default zoom level.

  • Copy the chart to the clipboard or save it to a file - use the arrow to toggle between the two options.

2

You can use Intel Advisor filters to control the loops displayed in the Roofline chart; however, the Roofline chart does not support the Threads filter. Use this checkbox to build roofs for single-threaded applications (or for multi-threaded applications configured to run single threaded, such as one thread-per-rank for MPI applications).

3

By default, the Roofline chart displays data only for loops and functions in the Loop Information pane of the Survey Report. Use this checkbox to also display data for whole program stacks and different code paths leading to different representations of the same loops.

4

Load another result for performance comparison purposes.

5

  • Change the visibility and appearance of roofline representations (lines).

  • Change the appearance of loop weight representations (dots).

6

Zoom in and out using numerical values.

7

Display the number and percentage of loops in each loop weight representation category.

Investigate Loops

The Survey Report, with or without a Roofline chart, provides a wealth of information. See the Intel Advisor Help for investigation tips.

After you investigate the data, you have several choices:

If Your Investigation Shows This

Do This

All loops are vectorizing properly and performance is satisfactory.

You are done! Congratulations!

One or more loops is not vectorizing properly and performance is unsatisfactory.

  1. Improve application performance using various Intel Advisor features to guide your efforts.

  2. Rebuild your modified code.

  3. Run another Survey or Roofline analysis to verify all loops are vectorizing properly and performance is satisfactory.

You need more information (because, for example, there is an assumed dependency compiler diagnostic, or there are expensive memory instructions like gathers, inserts, or shuffles).

Continue your investigation by:

  1. Marking one or more loops for deeper analysis

  2. Defining the appropriate Project Properties for the Refinement analysis you plan to run

  3. Running one or more Refinement analyses

If this further investigation shows there is room for improvement:

  1. Make the improvements.

  2. Rebuild your modified code.

  3. Run another Survey or Roofline analysis to verify your application still runs correctly and all test cases pass, all loops are vectorizing properly, and performance is satisfactory.

Otherwise, you are done!

Run Dependencies Analysis

This step is optional.

Before running a Dependencies analysis, make sure you:

  • Set the appropriate Project Properties for the Dependencies Analysis type. (Use the same application, but a smaller input data set if possible. And select Track stack variables to detect all possible dependencies.)

  • Mark one or more un-vectorized loops for deeper analysis in the Survey Report.

Under Check Dependencies in the Vectorization Workflow, click the Intel Advisor control: Run analysis control to collect Dependencies data while your application executes.

After the Intel Advisor collects the data, it displays a Dependencies-focused Refinement Report similar to the following:


Intel Advisor: Dependencies Report
There are many controls available to help you focus on the data most important to you, including the following:

1

To display more information in the Dependencies Report about a loop you selected for deeper analysis: Click the associated data row.

2

To display instruction addresses and code snippets for associated code locations in the Code Locations pane: Click a data row.

To choose a problem of interest to display in the Dependencies Source window: Right click a data row, then choose View Source.

To open your default editor in another tab/window: Right-click a data row, then choose Edit Source to open an editor tab.

3

To choose a code location of interest to display in the Dependencies Source window: Right-click a data row, then choose View Source.

To open your default editor in another tab/window: Right-click a data row, then choose Edit Source to open an editor tab.

4

Use the Filter pane to:

  • Temporarily limit the items displayed in the Problems and Messages pane by clicking filter criteria in one or more filter categories.

  • Deselect filter criteria in one filter category, or deselect filter criteria in all filter categories.

  • Sort all filter criteria by name in ascending alphabetical order or by count in descending numerical order. (You cannot change the order in which filter categories are presented.

5

To populate these columns and the Memory Access Patterns Report with data, run a Memory Access Patterns analysis.

Depending on what the Dependencies Report shows, do one or more of the following:

  • If there is no real dependency in the loop for the given workload, use one of the following to tell the compiler it is safe to vectorize:

    • #pragma simd ICL/ICC/ICPC directive, or #pragma omp simd OpenMP* 4.0 standard, or !DIR$ SIMD or !$OMP SIMD IFORT directive to ignore all dependencies in the loop

    • #pragma ivdep ICL/ICC/ICPC directive or !DIR$ IVDEP IFORT directive to ignore only vector dependencies (which is safest, but less powerful in certain cases)

    • restrict keyword

  • If there is an anti-dependency (often called a Write after read dependency or WAR), enable vectorization using the #pragma simd vectorlength(k) ICL/ICC/ICPC directive or !DIR$ SIMD VECTORLENGTH(k) IFORT directive, where k is smaller than the distance between dependent items in anti-dependency:

  • If there is a reduction in the loop, enable vectorization using the #pragma omp simd reduction(operator:list) ICL/ICC/ICPC directive or !$OMP SIMD REDUCTION(operator:list) IFORT directive.

  • Rewrite code to remove dependencies.

After you finish making improvements:

  1. Run a MAP analysis if desired.

  2. Rebuild your modified code.

  3. Run another Survey analysis, with or without a Roofline chart, to verify your application still runs correctly and all test cases pass, all loops are vectorizing properly, and performance is satisfactory.

Run Memory Access Patterns (MAP) Analysis

This step is optional.

Before running a MAP analysis, make sure you:

  • Set the appropriate Project Properties for the Memory Access Patterns Analysis type. (Use the same application, but a smaller input data set if possible.)

  • Mark one or more loops for deeper analysis in the Survey Report.

Under Check Memory Access Patterns in the Vectorization Workflow, click the Intel Advisor control: Run analysis control to collect MAP data while your application executes.

After the Intel Advisor collects the data, it displays a MAP-focused Refinement Report similar to the following:

Intel Advisor: Memory Access Patterns (MAP) Report

After you finish making improvements:

  1. Rebuild your modified code.

  2. Run another Survey analysis, with or without a Roofline chart, to verify your application still runs correctly and all test cases pass, all loops are vectorizing properly, and performance is satisfactory.

Tip

Double-click source lines at the bottom of the report to get a more detailed source and assembly access pattern report where stride information is provided at the instruction level.

Troubleshooting/FAQ

Also, see https://software.intel.com/en-us/intel-advisor-xe-support/faq.

To Do This

Optimal C/C++ Settings

Retrieve better compiler diagnostics.

Disable Interprocedural Optimization (IPO):

  • Linux* OS command line: -no-ipo

  • Windows* command line: /Qipo-

  • Visual Studio* IDE: C/C++ > Optimization [Intel C++] > Interprocedural Optimization > No

Address any issues with source line matching.

Do one of the following:

  • Raise the debug level:

    • Linux* OS command line: -debug: inline-debug-info

    • Windows* command line: -debug: inline-debug-info

    • Visual Studio* IDE: You can add this option to Command line > Additional Options.

  • Temporarily disable inlining:

    • Linux* OS command line: -ip-no-inlining

    • Windows* command line: /Qip-no-inlining

    • Visual Studio* IDE: C/C++ > Optimization > Inline Function Expansion > Only __inline (/Ob1) or higher

Experiment with generating code for different instructions if it appears your application doesn't use fresh vector instructions.

Linux* OS command line: -xHost, -xSSE4.2, -xAVX, -axAVX, -xCORE-AVX2, -axCORE-AVX2, -xCOMMON-AVX512, -xMIC-AVX512, -axMIC-AVX512, -xCORE-AVX512

Windows* OS command line: /QxHost, /QxSSE4.2, /QxAVX, /QaxAVX, /QxCORE-AVX2, /QaxCORE-AVX2, /QxCOMMON-AVX512, /QxMIC-AVX512, /QaxMIC-AVX512, /QxCORE-AVX512

Visual Studio* IDE: C/C++ > Code Generation [Intel C++] > Intel Processor-Specific Optimization

To Do This

Optimal Fortran Settings

Retrieve better compiler diagnostics.

Disable Interprocedural Optimization (IPO):

  • Linux* OS command line: -no-ipo

  • Windows* command line: /Qipo-

  • Visual Studio* IDE: Fortran > Optimization > Interprocedural Optimization > No

Address any issues with source line matching.

Do one of the following:

  • Raise the debug level:

    • Linux* OS command line: -debug: inline-debug-info

    • Windows* command line: -debug: inline-debug-info

    • Visual Studio* IDE: You can add this option to Command line > Additional Options.

  • Temporarily disable inlining:

    • Linux* OS command line: -ip-no-inlining

    • Windows* command line: /Qip-no-inlining

    • Visual Studio* IDE: Fortran > Optimization > Inline Function Expansion > Only INLINE Directive (/Ob1) or higher

Experiment with generating code for different instructions if it appears your application doesn't use fresh vector instructions.

Linux* OS command line: -xHost, -xSSE4.2, -xAVX, -axAVX, -xCORE-AVX2, -axCORE-AVX2, -xCOMMON-AVX512, -xMIC-AVX512, -axMIC-AVX512, -xCORE-AVX512

Windows* OS command line: /QxHost, /QxSSE4.2, /QxAVX, /QaxAVX, /QxCORE-AVX2, /QaxCORE-AVX2, /QxCOMMON-AVX512, /QxMIC-AVX512, /QaxMIC-AVX512, /QxCORE-AVX512

Visual Studio* IDE: Fortran > Code Generation > Intel Processor-Specific Optimization

For more complete information about compiler optimizations, see our Optimization Notice.