Usage of linear and uniform clause in Elemental function (SIMD enabled function)

Introduction

Vectorization plays a paramount role in speeding up applications which have inherent data parallelism. Often loops which are targeted for vectorization have function calls from loop body. Intel C++ Compiler ships vector version of the standard math functions and thus usage of these functions inside the loop body won't hinder the vectorization of the loop. But that is not the case for user defined functions. Elemental functions (SIMD enabled functions) in C++ is tool for explicitly generating the vector version of the function. This tool is shipped as a part of Intel(R) Cilk(TM) Plus package. Elemental functions comes with a bunch of clauses which help articulating to the compiler what the function body does. It is highly recommended to use the relevant clauses when defining the Elemental version of a user defined function. This helps to generate a better code. This article focusses on the usage of linear clause with Elemental function.

Serial case

In order to demonstrate the concept if linear clause, a simple example of addition of two arrays and storing the result in the third array is considered. Below is the serial version of the program.

```#include<iostream>
#define N 100
using namespace std;
__declspec(noinline) void add(int *a, int *b, int *c){
*c = *a + *b;
return;
}
int main(){
int a[N], b[N], c[N], i;
a[:] = __sec_implicit_index(0);
b[:] = (N-1) - __sec_implicit_index(0);
c[:] = 0;
for(i = 0; i < N; i++)
for(i = 0; i < N; i++)
cout<<c[i]<<"n";
return 0;
}
```

\$ icpc testelemental.cc -vec-report2
testelemental.cc(10): (col. 9) remark: LOOP WAS VECTORIZED
testelemental.cc(11): (col. 17) remark: LOOP WAS VECTORIZED
testelemental.cc(15): (col. 2) remark: loop was not vectorized: existence of vector dependence
testelemental.cc(13): (col. 2) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate

The above loop doesn't vectorize stating "nonstandard loop is not a vectorization candidate" because the loop body has a call to add() which doesn't have a corresponding vector vesion of the function. This hinders the vectorization of the loop. Elemental functions comes as a savior at this point in generating both scalar and vector version of the function. Below is the demonstartion of the same.

Vector function case

Below is a program which introduces the vector version of add() function. It as simple as annotating the function declaration with __declspec(vector(linear(a:1,b:1,c:1))). The vector clause states the compiler to generate both scalar and vector version of add() function. The linear clause on a,b, and c states the compiler that pointers a, b and c will be incremented by 1 for each iteration.

```#include<iostream>
#define N 100
using namespace std;
__declspec(noinline,vector(linear(a:1,b:1,c:1))) void add(int *a, int *b, int *c){
*c = *a + *b;
return;
}
int main(){
int a[N], b[N], c[N], i;
a[:] = __sec_implicit_index(0);
b[:] = (N-1) - __sec_implicit_index(0);
c[:] = 0;
for(i = 0; i < N; i++)
for(i = 0; i < N; i++)
cout<<c[i]<<"n";
return 0;
}
```

\$ icpc testelemental.cc -vec-report2
testelemental.cc(10): (col. 9) remark: LOOP WAS VECTORIZED
testelemental.cc(11): (col. 17) remark: LOOP WAS VECTORIZED
testelemental.cc(13): (col. 2) remark: LOOP WAS VECTORIZED
testelemental.cc(15): (col. 2) remark: loop was not vectorized: existence of vector dependence
testelemental.cc(4): (col. 82) remark: FUNCTION WAS VECTORIZED
testelemental.cc(4): (col. 82) remark: FUNCTION WAS VECTORIZED
testelemental.cc(4): (col. 82) remark: FUNCTION WAS VECTORIZED
testelemental.cc(4): (col. 82) remark: FUNCTION WAS VECTORIZED

The vectorization report clearly states that the function was vectorized. Also the loop body which has the call to add() function is also vectorized in this process.

Another Scalar version

Sometimes the same logic of adding two arrays and storing result in a third array can written as shown below (scalar version)
```#include<iostream>
#define N 100
using namespace std;
__declspec(noinline) void add(int *a, int *b, int *c, int i){
c[i] = a[i] + b[i];
return;
}
int main(){
int a[N], b[N], c[N], i;
a[:] = __sec_implicit_index(0);
b[:] = (N-1) - __sec_implicit_index(0);
c[:] = 0;
for(i = 0; i < N; i++)
for(i = 0; i < N; i++)
cout<<c[i]<<"n";
return 0;
}
```
\$ icpc testelemental.cc -vec-report2
testelemental.cc(10): (col. 9) remark: LOOP WAS VECTORIZED
testelemental.cc(11): (col. 17) remark: LOOP WAS VECTORIZED
testelemental.cc(15): (col. 2) remark: loop was not vectorized: existence of vector dependence
testelemental.cc(13): (col. 2) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate
To convert the above version of the function to Elemental function without any change in signature of the function or body of the function, the parameters a,b and c are considered as uniform (the same value is braodcasted to all iterations) and array index i will be considered as linear(i:1) since it is increasing by 1 for every iteration. Below is the code for the same:
```#include<iostream>
#define N 100
using namespace std;
__declspec(noinline, vector(uniform(a,b,c),linear(i:1))) void add(int *a, int *b, int *c, int i){
c[i] = a[i] + b[i];
return;
}
int main(){
int a[N], b[N], c[N], i;
a[:] = __sec_implicit_index(0);
b[:] = (N-1) - __sec_implicit_index(0);
c[:] = 0;
for(i = 0; i < N; i++)