Developer Guide and Reference

Contents

Using Automatic Vectorization

Automatic vectorization is supported on IA-32 and Intel® 64 architectures. The information below will guide you in setting up the auto-vectorizer.

Vectorization Speed-up

Where does the vectorization speedup come from? Consider the following sample code fragment, where
a
,
b
and
c
are integer arrays:
Sample Code Fragment
for (I=0;i<=MAX;i++) c[i]=a[i]+b[i];
If vectorization is not enabled, that is, you compile using the
O1
or
-no-vec-
(or
/Qvec-
)
option, for each iteration, the compiler processes the code such that there is a lot of unused space in the SIMD registers, even though each of the registers could hold three additional integers. If vectorization is enabled (compiled using
O2
or higher options), the compiler may use the additional registers to perform four additions in a single instruction. The compiler looks for vectorization opportunities whenever you compile at default optimization (
O2
) or higher.
Using this option enables vectorization at default optimization levels for both Intel® microprocessors and non-Intel microprocessors. Vectorization may call library routines that can result in additional performance gain on Intel® microprocessors than on non-Intel microprocessors.
The vectorization can also be affected by certain options, such as
/arch
(Windows*),
-m
(Linux*
and
macOS*
), or
[Q]x
.
To allow comparisons between vectorized and not-vectorized code, disable vectorization using the
/Qvec-
(Windows*) or
-no-vec
(Linux*
or
macOS*
) option; enable vectorization using the
O2
option.
To get information on whether a loop was vectorized or not, enable generation of the optimization report using the options