Diagnostic 15037: xxxx was not vectorized: vectorization possible but seems inefficient

There can be mutiple variants of this diagnostic:
1. remainder loop was not vectorized: vectorization possible but seems inefficient
2. loop was not vectorized: vectorization possible but seems inefficient

Variant 1: remainder loop was not vectorized: vectorization possible but seems inefficient

Cause:

Compiler in the process of vectorizing a loop sometimes creates a prolog loop, main kernel and epilog loop (remainder loop). This is done to make sure the aligned memory access instructions can be used. In this example, the compiler creates an epilog loop and figures out that it is not efficient vectorizing.

Example:

#include<iostream>
#define N 70
int main(){
static short tab1[N], 
tab2[N];
int i, j;
static short const data[] = {32768, -256, -255, -128, -127, -1, 0, 1, 127, 128, 255, 256, 32767};
for (j = i = 0; i < N; i++) 
{
tab1[i] = i;
tab2[i] = data[j++];
if (j > 12) j = 0;
}
int sum = __sec_reduce_add(tab1[:]) + __sec_reduce_add(tab2[:]);
return 0;
}


$ icpc example9.cc -vec-report6
example9.cc(8): (col. 5) remark: vectorization support: reference tab1 has aligned access
example9.cc(7): (col. 1) remark: vectorization support: unroll factor set to 4
example9.cc(7): (col. 1) remark: PARTIAL LOOP WAS VECTORIZED
example9.cc(8): (col. 5) remark: vectorization support: reference tab1 has aligned access
example9.cc(7): (col. 1) remark: remainder loop was not vectorized: vectorization possible but seems inefficient
example9.cc(7): (col. 1) remark: loop was not vectorized: existence of vector dependence

Variant 2: loop was not vectorized: vectorization possible but seems inefficient

Cause:

This vectorization diagnostic is generated when compiler comes across a loop which according to its heuristics won't benefit from vectorization. Below is one of the cases when this message is generated. In the below example, the compiler heuristics evaluates that overhead for creation of a vector operand (non-unit stride access in the vector operand creation) is significant when compared to the number/type of computation in which those vector operands are used.

Example:

#include<iostream>
#define N 100
struct s1 {
int a, b, c;
};
int main(){
s1 arr[N], sum;
for(int i = 0; i < N; i++)
{
        sum.a += arr[i].a;
        sum.b += arr[i].b;
        sum.c += arr[i].c;
}
std::cout<<sum.a<<"t"<<sum.b<<"t"<<sum.c<<"n";
return 0;
}

$ icpc example11.cc -c -vec-report2
example11.cc(8): (col. 1) remark: loop was not vectorized: vectorization possible but seems inefficient

For more complete information about compiler optimizations, see our Optimization Notice.