• 2019 Update 4
  • 03/20/2019
  • Public Content

Benefitting From Implicit Vectorization

OpenCL™ Code Builder includes an implicit vectorization module as part of the program build process. When it is beneficial in terms of performance, this module packs several work-items together and executes them with SIMD instructions. This enables you to benefit from the vector units in the Intel® Architecture Processors without writing explicit vector code.
The vectorization module transforms scalar data type operations by adjacent work-items into an equivalent vector operations. When vector operations already exist in the kernel source code, the module scalarizes (breaks them down into component operations) and revectorizes them. This improves performance by transforming the memory access pattern of the kernel into a structure of arrays (SOA), which is often more cache-friendly than an array of structures (AOS).
You can find more details in the "Intel OpenCL™ Implicit Vectorization Module overview" article.
The implicit vectorization module works best for the kernels that operate on elements, which are four-byte wide, such as
data types. You can define the computational width of a kernel using the OpenCL
Since the default computation width is four-byte, kernels are vectorized by default. If your kernel uses vectors explicitly, you can specify
with typen of any vector type (for example,
). This attribute indicates to the vectorization module that it should apply only transformations that are useful for this type.
The performance benefit from the vectorization module might be lower for the kernels that include a complex control flow.
To benefit from vectorization, your code does not need for loops within kernels. For best results, let the kernel deal with a single data element, and let the vectorization module take care of the rest. The more straightforward your OpenCL code is, the more optimization you get from vectorization.Writing the kernel in the plain scalar code is what works best for efficient vectorization. This method of coding avoids potential disadvantages associated with explicit (manual) vectorization described in the "Using Vector Data Types" section.
See Also
Vectorizer KnobsUsing Vector Data TypesTips for Auto-Vectorization Module Intel OpenCL™ Implicit Vectorization Module overview at http://llvm.org/devmtg/2011-11/Rotem_IntelOpenCLSDKVectorizer.pdf

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804