• 10/30/2018
  • Public Content
Contents

Avoid Extracting Vector Components

Consider the following kernel:
__constant float4 oneVec = (float4)(1.0f, 1.0f, 1.0f, 1.0f); __kernel __attribute__((vec_type_hint(float4))) void inverter2(__global float4* input, __global float4* output) {   int tid = get_global_id(0);   output[tid] = oneVec – input[tid];   output[tid].w = input[tid].w;   output[tid] = sqrt(output[tid]); }
For this example of the explicit vector code, the extraction of the
w
component is very costly. The reason is that the next vector operation forces reloading the same vector from memory. Consider loading a vector once and performing all changes by use of vector operations even for a single component.
In this specific case, two changes are required:
  1. Modify the
    oneVec
    , so that its
    w
    component is
    zero
    , causing only a sign flip in the
    w
    component of the input vector.
  2. Use
    float
    representation to manually flip the sign bit of the
    w
    component back.
As a result, the kernel appears as follows:
__constant float4 oneVec = (float4)(1.0f, 1.0f, 1.0f, 0.0f); __constant int4 signChanger = (int4)(0, 0, 0, 0x80000000); __kernel __attribute__((vec_type_hint(float4))) void inverter3(__global float4* input, __global float4* output) {   int tid  = get_global_id(0);   output[tid] = oneVec – input[tid];   output[tid] = as_float4(as_int4(output[tid]) ^ signChanger);   output[tid] = sqrt(output[tid]); }
At the cost of another constant vector, this implementation performs all the required operations addressing only full vectors. All the computations might also be performed in
float8
.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.