• 2019 Update 4
• 03/20/2019
• Public Content
Contents

# Avoid Extracting Vector Components

Consider the following kernel:
```
__constant float4 oneVec = (float4)(1.0f, 1.0f, 1.0f, 1.0f);
__kernel __attribute__((vec_type_hint(float4)))
void inverter2(__global float4* input, __global float4* output)
{
int tid = get_global_id(0);
output[tid] = oneVec – input[tid];
output[tid].w = input[tid].w;
output[tid] = sqrt(output[tid]);
}
```
For this example of the explicit vector code, extraction of the
w
component is very costly. The reason is that the next vector operation forces re-loading the same vector from memory. Consider loading a vector once and performing all changes, even to a single component, by use of vector operations.
In this specific case, two changes are required:
1. Modify the
oneVec
so that its
w
component is
zero
, causing only a sign change in the
w
component of the input vector.
2. Use
float
representation to manually change the sign bit of the
w
component back.
As a result, the kernel appears as follows:
```
__constant float4 oneVec = (float4)(1.0f, 1.0f, 1.0f, 0.0f);
__constant int4 signChanger = (int4)(0, 0, 0, 0x80000000);
__kernel __attribute__((vec_type_hint(float4)))
void inverter3(__global float4* input, __global float4* output)
{
int tid  = get_global_id(0);
output[tid] = oneVec – input[tid];
output[tid] = as_float4(as_int4(output[tid]) ^ signChanger);
output[tid] = sqrt(output[tid]);
}
```
At the cost of another constant vector, this implementation performs all the required operations addressing only full vectors. All the computations can be performed in
float8
.

#### Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.