• 10/30/2018
  • Public Content

Avoid Extracting Vector Components

Consider the following kernel:
__constant float4 oneVec = (float4)(1.0f, 1.0f, 1.0f, 1.0f); __kernel __attribute__((vec_type_hint(float4))) void inverter2(__global float4* input, __global float4* output) {   int tid = get_global_id(0);   output[tid] = oneVec – input[tid];   output[tid].w = input[tid].w;   output[tid] = sqrt(output[tid]); }
For this example of the explicit vector code, the extraction of the
component is very costly. The reason is that the next vector operation forces reloading the same vector from memory. Consider loading a vector once and performing all changes by use of vector operations even for a single component.
In this specific case, two changes are required:
  1. Modify the
    , so that its
    component is
    , causing only a sign flip in the
    component of the input vector.
  2. Use
    representation to manually flip the sign bit of the
    component back.
As a result, the kernel appears as follows:
__constant float4 oneVec = (float4)(1.0f, 1.0f, 1.0f, 0.0f); __constant int4 signChanger = (int4)(0, 0, 0, 0x80000000); __kernel __attribute__((vec_type_hint(float4))) void inverter3(__global float4* input, __global float4* output) {   int tid  = get_global_id(0);   output[tid] = oneVec – input[tid];   output[tid] = as_float4(as_int4(output[tid]) ^ signChanger);   output[tid] = sqrt(output[tid]); }
At the cost of another constant vector, this implementation performs all the required operations addressing only full vectors. All the computations might also be performed in

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.