Vectorization Toolkit

By Amanda K Sharp, Published: 05/14/2012, Last Updated: 03/25/2019


The following toolkit discusses six steps you can follow to increase performance through vectorization in your application. Want more context? Click on the links below to learn more about optimizing your application with Intel® Software Development Tools.

   Try the Tools      Ask a question   

The 6-Step Process for Vectorizing Your Application

Step 1. Measure Baseline Release Build Performance

It is important to work with a release build (and not a debug build) because:

  1. The compiler will optimize your code and may change which loop is in a hotspot.
  2. You need to know what baseline time your application has so that you can determine if vectorization has improved performance.

You should set a goal for performance so you know when you are done (when the release build is fast enough).

A release build is the default in the Intel Compiler. You have to specifically turn off optimizations by doing a DEBUG build on Windows* (or using the -Zi switch) or using the -O0 switch (or using the -g switch) on Linux* or macOS*.

Step 2. Determine Hotspots Using Intel® VTune™ Amplifier

You can use Intel® VTune™ Amplifier, Intel's performance profiler, to find the most time-consuming functions in your application.  The "Hotspots" analysis type is recommended. Identifying which areas of your code are taking the most time will allow you to focus your optimization efforts in the areas where performance improvements will have the most effect. Generally you want to focus on only the top few hotspots, or functions taking at least 10% of your application's total time. Make note of the hotspots you want to focus on for the next step.  

Additional Resources for this Step

Step 3. Determine Loop Candidates Using Intel Compiler Optimization Report

The vectorization section of the optimization report can tell you whether or not each loop in your code was vectorized. Look at the output of the optimization report for the hotspots you determined in Step 2. If there are loops in your hotspots that did not vectorize, check whether they have math, data processing, or string calculations on data in parallel (for instance in an array). If they do, they might benefit from vectorization. Move to Step 4, if any candidates are found. 

To generate information about compiler vectorization from the optimization report, compile with the options "-qopt-report -qop-report-phase=loop,vec" on Linux or "/Qopt-report /Qopt-report-phase:loop,vec" on Windows.

Note that the Intel Compiler can be run on just a portion of the code and will be compatible with the native compilers (gcc on Linux and macOS and Microsoft* Visual C++ on Windows*).

Begin optimization report for: test_scalar_dep(double *, int)

    Report from: Vector optimizations [vec]

LOOP BEGIN at scalar_dep.cpp(79,1)
   remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details
   remark #15346: vector dependence: assumed ANTI dependence between  line 81 and  line 80

Additional Resources for this Step

Step 4. Get Advice Using Intel® Advisor

Use the vectorization analysis capability of Intel Advisor to analyze your application's run time behavior and identify the components of the application that will benefit most from vectorization.

After running an analysis and generating a result, you can view the result through the GUI or you can generate text-based reports as needed. The analysis type used in collection determines which report types can be generated from the result. 


Additional Resources for this Step

Step 5. Implement Vectorization Recommendations

For example, if Intel Advisor identified that "Scalar vector dependence" is preventing the Intel compiler from generating vectorized code. Make sure the recommended change does not affect the semantics nor safety of your loop calculations. One way to ensure that the loop has no dependencies that may be affected is to see if executing the loop in backwards order would change the results. Another is to think about the calculations in your loop being in a scrambled order. If the results would be changed, your loop has dependencies and vectorization would not be "safe".  You may still be able to vectorize by eliminating dependencies in the loop. Modify your source code to give additional information to the compiler or optimize your loop for better vectorization.

Additional Resources for this Step

Step 6. Repeat!

Continue to modify the application code based on the performance analysis and check the new performance metrics to compare the results. Repeat this cycle until the results match your performance goals or until no significant hot spots are found.

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804