• 2019 Update 4
  • 03/20/2019
  • Public Content

Avoid Extracting Vector Components

Consider the following kernel:
__constant float4 oneVec = (float4)(1.0f, 1.0f, 1.0f, 1.0f); __kernel __attribute__((vec_type_hint(float4))) void inverter2(__global float4* input, __global float4* output) { int tid = get_global_id(0); output[tid] = oneVec – input[tid]; output[tid].w = input[tid].w; output[tid] = sqrt(output[tid]); }
For this example of the explicit vector code, extraction of the
component is very costly. The reason is that the next vector operation forces re-loading the same vector from memory. Consider loading a vector once and performing all changes, even to a single component, by use of vector operations.
In this specific case, two changes are required:
  1. Modify the
    so that its
    component is
    , causing only a sign change in the
    component of the input vector.
  2. Use
    representation to manually change the sign bit of the
    component back.
As a result, the kernel appears as follows:
__constant float4 oneVec = (float4)(1.0f, 1.0f, 1.0f, 0.0f); __constant int4 signChanger = (int4)(0, 0, 0, 0x80000000); __kernel __attribute__((vec_type_hint(float4))) void inverter3(__global float4* input, __global float4* output) { int tid = get_global_id(0); output[tid] = oneVec – input[tid]; output[tid] = as_float4(as_int4(output[tid]) ^ signChanger); output[tid] = sqrt(output[tid]); }
At the cost of another constant vector, this implementation performs all the required operations addressing only full vectors. All the computations can be performed in

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804