• 10/30/2018
  • Public Content

Using Vector Data Types

To maximize use of vector CPUs, consider using vector data types in your kernel code as a more involved performance alternative to the automatic (compiler-aided) vectorization described in the Benefitting from Implicit Vectorization section. This technique enables you to map vector data types directly to the hardware vector registers. Thus, the used data types should match the width of the underlying SIMD instructions.
Consider the following recommendations:
  • Starting the 2nd Generation Intel® Core™ processors with Intel® Advanced Vector Extension (Intel® AVX) support, use data types such as
    , so you bind code to the specific register width of the underlying hardware. This method provides maximum performance on a specific platform. However, performance on other platforms and generations of Intel® Core™ processors might be less than optimal.
  • Use wider data types, such as
    , to transparently cover many SIMD hardware register widths. However, using types wider than the underlying hardware is similar to loop unrolling. This method might improve performance in some cases, but also increases register pressure. Consider using
    data type to process four pixels simultaneously when operating on pixels with eight bits per component.
  • With vector data types, each work item processes
    elements. Make sure the size of a grid, which is the number of work-items required to process the same dataset, does not exceed the N value.
: The
data type improves performance only starting the 4th Generation Intel® Core™ processors.
Using vector data types, you plan the vector-level parallelism yourself instead of relying on the implicit vectorization module. See the Benefitting from Implicit Vectorization section for more information.
This approach is useful in the following scenarios:
  • You are porting the code that originally used the following instructions:
    • Intel® Streaming SIMD Extensions (Intel® SSE)
    • Intel® Advanced Vector Extensions (Intel® AVX)
    • Intel® Advanced Vector Extensions 2 (Intel® AVX2)
    • Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
  • You want to benefit from hand-tuned vectorization of your code.
The following example demonstrates the multiplication kernel that targets the 256-bit vector units of the 2nd Generation Intel Core processors and higher:
__kernel __attribute__((vec_type_hint(float8))) void edp_mul(__global const float8 *a,                       __global const float8 *b,                       __global float8 *result) {   int id = get_global_id(0);   result[id] = a[id]* b[id];   }
In this example, the data passed to the kernel represents buffers of float8. The calculations are performed on eight elements together.
The attribute added before the kernel, signals the compiler, or the implementation that this kernel has an optimized vectorized form, so the implicit vectorization module does not operate on it. Use
to hint compiler that your kernel already processes data using mostly vector types. For more details on this attribute, see the section 6.7.2 of the OpenCL™ 1.2 specification at https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf.

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804