Vectorizing Loops with calls to user defined external functions

Introduction

Vectorization plays a paramount role in speeding up applications which have inherent data parallelism. Often loops which are targeted for vectorization have function calls from loop body. Intel C++ Compiler ships vector version of the standard math functions and thus usage of these functions inside the loop body won't hinder the vectorization of the loop. But that is not the case for user defined functions. Elemental functions (SIMD enabled functions) in C++ is tool for explicitly generating the vector version of the function. This tool is shipped as a part of Intel(R) Cilk(TM) Plus package. Elemental functions comes with a bunch of clauses which help articulating to the compiler what the function body does. It is highly recommended to use the relevant clauses when defining the Elemental version of a user defined function. This helps to generate a better code. This article focusses on the usage of linear clause with Elemental function.

This article is part of the Intel® Modern Code Developer Community documentation which supports developers in leveraging application performance in code through a systematic step-by-step optimization framework methodology.

 

Serial case

In order to demonstrate the concept if linear clause, a simple example of addition of two arrays and storing the result in the third array is considered. Below is the serial version of the program.

#include<iostream>
#define N 100
using namespace std;
__declspec(noinline) void add(int *a, int *b, int *c){
        *c = *a + *b;
        return;
}
int main(){
        int a[N], b[N], c[N], i;
        a[:] = __sec_implicit_index(0);
        b[:] = (N-1) - __sec_implicit_index(0);
        c[:] = 0;
        for(i = 0; i < N; i++)
                add(&a[i], &b[i], &c[i]);
        for(i = 0; i < N; i++)
                cout<<c[i]<<"n";
        return 0;
}
 

$ icpc testelemental.cc -vec-report2
testelemental.cc(10): (col. 9) remark: LOOP WAS VECTORIZED
testelemental.cc(11): (col. 17) remark: LOOP WAS VECTORIZED
testelemental.cc(15): (col. 2) remark: loop was not vectorized: existence of vector dependence
testelemental.cc(13): (col. 2) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate

The above loop doesn't vectorize stating "nonstandard loop is not a vectorization candidate" because the loop body has a call to add() which doesn't have a corresponding vector vesion of the function. This hinders the vectorization of the loop. Elemental functions comes as a savior at this point in generating both scalar and vector version of the function. Below is the demonstartion of the same.
 
Vector function case
 

Below is a program which introduces the vector version of add() function. It as simple as annotating the function declaration with __declspec(vector(linear(a:1,b:1,c:1))). The vector clause states the compiler to generate both scalar and vector version of add() function. The linear clause on a,b, and c states the compiler that pointers a, b and c will be incremented by 1 for each iteration.

#include<iostream>
#define N 100
using namespace std;
__declspec(noinline,vector(linear(a:1,b:1,c:1))) void add(int *a, int *b, int *c){
        *c = *a + *b;
        return;
}
int main(){
        int a[N], b[N], c[N], i;
        a[:] = __sec_implicit_index(0);
        b[:] = (N-1) - __sec_implicit_index(0);
        c[:] = 0;
        for(i = 0; i < N; i++)
                add(&a[i], &b[i], &c[i]);
        for(i = 0; i < N; i++)
                cout<<c[i]<<"n";
        return 0;
}
 

$ icpc testelemental.cc -vec-report2
testelemental.cc(10): (col. 9) remark: LOOP WAS VECTORIZED
testelemental.cc(11): (col. 17) remark: LOOP WAS VECTORIZED
testelemental.cc(13): (col. 2) remark: LOOP WAS VECTORIZED
testelemental.cc(15): (col. 2) remark: loop was not vectorized: existence of vector dependence
testelemental.cc(4): (col. 82) remark: FUNCTION WAS VECTORIZED
testelemental.cc(4): (col. 82) remark: FUNCTION WAS VECTORIZED
testelemental.cc(4): (col. 82) remark: FUNCTION WAS VECTORIZED
testelemental.cc(4): (col. 82) remark: FUNCTION WAS VECTORIZED

The vectorization report clearly states that the function was vectorized. Also the loop body which has the call to add() function is also vectorized in this process.

 
Another Scalar version

 
Sometimes the same logic of adding two arrays and storing result in a third array can written as shown below (scalar version)
 
#include<iostream>

#define N 100

using namespace std;

__declspec(noinline) void add(int *a, int *b, int *c, int i){

        c[i] = a[i] + b[i];

        return;

}

int main(){

        int a[N], b[N], c[N], i;

        a[:] = __sec_implicit_index(0);

        b[:] = (N-1) - __sec_implicit_index(0);

        c[:] = 0;

        for(i = 0; i < N; i++)

                add(a, b, c, i);

        for(i = 0; i < N; i++)

                cout<<c[i]<<"n";

        return 0;

}

 
$ icpc testelemental.cc -vec-report2
testelemental.cc(10): (col. 9) remark: LOOP WAS VECTORIZED
testelemental.cc(11): (col. 17) remark: LOOP WAS VECTORIZED
testelemental.cc(15): (col. 2) remark: loop was not vectorized: existence of vector dependence
testelemental.cc(13): (col. 2) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate
 
To convert the above version of the function to Elemental function without any change in signature of the function or body of the function, the parameters a,b and c are considered as uniform (the same value is braodcasted to all iterations) and array index i will be considered as linear(i:1) since it is increasing by 1 for every iteration. Below is the code for the same:
 
#include<iostream>

#define N 100

using namespace std;

__declspec(noinline, vector(uniform(a,b,c),linear(i:1))) void add(int *a, int *b, int *c, int i){

        c[i] = a[i] + b[i];

        return;

}

int main(){

        int a[N], b[N], c[N], i;

        a[:] = __sec_implicit_index(0);

        b[:] = (N-1) - __sec_implicit_index(0);

        c[:] = 0;

        for(i = 0; i < N; i++)

                add(a, b, c, i);

        for(i = 0; i < N; i++)

                cout<<c[i]<<"n";

        return 0;

}

 
So the idea is to convert the scalar version of the function to corresponding Elemental function with no change to the function signature. The clauses for Elemental functions makes it easier to articulate on how the compiler should intrepret the function body logic without any change to the function signature or body. This article demonstartes the same functionality when written in different forms (in scalar version), can have different vector versions by using the right set of clauses. 
Para obtener información más completa sobre las optimizaciones del compilador, consulte nuestro Aviso de optimización.