vectorization failure


Problem version:
Intel® C++ 11.0.066

Environment :
Windows* IA-32

Problem:
The following routine does not run SSE2 code at all if the incoming buffers (a,b,c) differ by exactly n bytes:

int func(const char *const a, const char *const b, char *c, int n)
{
int i;
for (i=0;i<n;i++) {
c[i] = a[i]+b[i];
}
return 0;
}

int main()
{
func(a1,b1,c1,996);
return 1;
}

char arr[3000]={1};
char *a1 = arr;
char *b1=arr+1000;
char *c1=arr+2000;

The compiler emits several "jae" instructions when "ja" is correct. Note that the properly vectorized code is run if the buffers are n+1 bytes apart. Also, compiler switches /Qalias-const or /Qalias-args- have no effect.

Root Cause :
Try changing the array size to 1001 and keep the trip count to 1000. and you may see improved perfomance.
This problem is due to multiversioning runtime check having “>1000”, not “>=1000” for the address difference runtime check.
It’s a bug related to the boundary conditions of multiversioning optimization.

Multiver is checking whether the address ranges overwrap, and that check is justone-byte too conservative (due to 1“>”1 versus 1“>=”1).

Resolution :
The 11.0 latest, and 11.1 compilers have this fix.


Optimization Notice in English

For more complete information about compiler optimizations, see our Optimization Notice.