Vectorization is one of many optimizations that are enabled by default in the latest Intel compilers. In order to be vectorized, loops must obey certain conditions, listed below. Some additional ways to help the compiler to vectorize loops are described.
When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive.
One key to attaining good parallel performance is choosing the right granularity for the application. Granularity is the amount of real work in the parallel task. If granularity is too fine, then performance can suffer from communication overhead.
How to configure OpenMP in the Intel IPP library to maximize multi-threaded performance of the Intel IPP primitives.
GOptimize Data Structures and Memory Access Patterns to I
Reference Link and Download
Intel Vectorization Tools
Intel® Cilk™ Plus is an extension to the C and C++ languages to support data and task parallelism. It provides three new keywords to i
Fortran Standard Parallel Programming Features in Intel Compilers
Vectorizing improves performance, and achieving high performance can save power. Introduction to tools for vectorizing compute-intensive processing.