To run the application you built:
The Intel® Compiler has an auto-vectorizer that detects operations in the application that can be done in parallel and converts sequential operations to parallel operations by using the Single Instruction Multiple Data (SIMD) instruction set.
In this tutorial, you will use the auto-vectorizer to improve the performance of the sample application. You will compare the performance of the serial version and the version that was compiled with the auto-vectorizer.
The compiler may be able to perform additional optimizations if it is able to optimize across source line boundaries. These may include, but are not limited to, function inlining. This is enabled with the /Qipo option.
Rebuild the program using the /Qipo option to enable interprocedural optimization.
Select Optimization [Intel C++] > Interprocedural Optimization > Multi-file(/Qipo).
To get the most out of the Intel® Threading Building Blocks (Intel® TBB) library, explore the following additional resources.
This tutorial will use the tachyon_samples.sln solution file. Open the solution to find these projects:
For the Intel® Compiler, vectorization is the unrolling of a loop combined with the generation of packed SIMD instructions. Because the packed instructions operate on more than one data element at a time, the loop can execute more efficiently. It is sometimes referred to as auto-vectorization to emphasize that the compiler automatically identifies and optimizes suitable loops on its own.