User-Mandated or SIMD Vectorization
User-mandated or SIMD vectorization supplements automatic vectorization
just like OpenMP* parallelizationsupplements automatic parallelization. The following figure illustrates this relationship. User-mandated vectorization is implemented as a single-instruction-multiple-data (SIMD) feature and is referred to as SIMD vectorization.
The SIMD vectorization feature is available for both Intel® microprocessors and non-Intel microprocessors. Vectorization may call library routines that can result in additional performance gain on Intel® microprocessors than on non-Intel microprocessors.
The vectorization can also be affected by certain options, such as
The following figure illustrates how SIMD vectorization is positioned among various approaches that you can take to generate vector code that exploits vector hardware capabilities. The programs written with SIMD vectorization are very similar to those written using auto-vectorization hints. You can use SIMD vectorization to minimize the amount of code changes that you may have to go through in order to obtain vectorized code.
SIMD vectorization uses the
to effect loop vectorization.
#pragma omp simdpragma
You must add this
pragmato a loop and recompile to vectorize the loop using the option
Consider an example in C++ where the function
add_floats()uses too many unknown pointers for the compiler’s automatic runtime independence check optimization to kick in. You can give a data dependence assertion using the auto-vectorization hint via
#pragma ivdepand let the compiler decide whether the auto-vectorization optimization should be applied to the loop. Or you can now enforce vectorization of this loop by using
#pragma omp simd.
Example: without #pragma omp simd
Example: with #pragma omp simd
The one big difference between using
#pragma omp simdand auto-vectorization hints is that with
#pragma omp simd, the compiler generates a warning when it is unable to vectorize the loop. With auto-vectorization hints, actual vectorization is still under the discretion of the compiler, even when you use the
#pragma vector alwayshint.
#pragma omp simdhas optional clauses to guide the compiler on how vectorization must proceed. Use these clauses appropriately so that the compiler obtains enough information to generate correct vector code. For more information on the clauses, see the
#pragma omp simddescription.
Note the following points when using the
- A variable may belong to zero or one of the following: private, linear, or reduction.
- Within the vector loop, an expression is evaluated as a vector value if it is private, linear, reduction, or it has a sub-expression that is evaluated to a vector value. Otherwise, it is evaluated as a scalar value (that is, broadcast the same value to all iterations). Scalar value does not necessarily mean loop invariant, although that is the most frequently seen usage pattern of scalar value.
- A vector value may not be assigned to a scalar L-value. It is an error.
- A scalar L-value may not be assigned under a vector condition. It is an error.
- Theswitchstatement is not supported.
You may find it difficult to describe vector semantics using the SIMD pragma for some auto-vectorizable loops. One example is
MAXreduction in C since the language does not have
Consider the following C++ example code with a loop containing the math function,
All code examples in this section are applicable for C/C++ on Windows* only.
Example: Loop with math function is auto-vectorized
When you compile the above code, the loop with
sinf()function is auto-vectorized using the appropriate Short Vector Math Library (SVML) library function provided by the Intel® C++ Compiler. The auto-vectorizer identifies the entry points, matches up the scalar math library function to the SVML function and invokes it.
However, within this loop if you have a call to your function,
foo(), that has the same prototype as
sinf(), the auto-vectorizer fails to vectorize the loop because it does not know what
foo()does unless it is inlined to this call site.
Example: Loop with user-defined function is NOT auto-vectorized
In such cases, you can use the
__attribute__((vector))(Linux) declaration to vectorize the loop. All you need to do is