• 10/30/2018
  • Public Content

Avoid Extracting Vector Components

Consider the following kernel:
__constant float4 oneVec = (float4)(1.0f, 1.0f, 1.0f, 1.0f); __kernel __attribute__((vec_type_hint(float4))) void inverter2(__global float4* input, __global float4* output) {   int tid = get_global_id(0);   output[tid] = oneVec – input[tid];   output[tid].w = input[tid].w;   output[tid] = sqrt(output[tid]); }
For this example of the explicit vector code, the extraction of the
component is very costly. The reason is that the next vector operation forces reloading the same vector from memory. Consider loading a vector once and performing all changes by use of vector operations even for a single component.
In this specific case, two changes are required:
  1. Modify the
    , so that its
    component is
    , causing only a sign flip in the
    component of the input vector.
  2. Use
    representation to manually flip the sign bit of the
    component back.
As a result, the kernel appears as follows:
__constant float4 oneVec = (float4)(1.0f, 1.0f, 1.0f, 0.0f); __constant int4 signChanger = (int4)(0, 0, 0, 0x80000000); __kernel __attribute__((vec_type_hint(float4))) void inverter3(__global float4* input, __global float4* output) {   int tid  = get_global_id(0);   output[tid] = oneVec – input[tid];   output[tid] = as_float4(as_int4(output[tid]) ^ signChanger);   output[tid] = sqrt(output[tid]); }
At the cost of another constant vector, this implementation performs all the required operations addressing only full vectors. All the computations might also be performed in

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804