vectorization failure

Submit New Article

Last Modified On :   November 26, 2009 3:05 AM PST
Rate
 



Problem version: 
Intel® C++ 11.0.066

Environment : 
Windows* IA-32

Problem:
The following routine does not run SSE2 code at all if the incoming buffers (a,b,c) differ by exactly n bytes:

int func(const char *const a, const char *const b, char *c, int n)
{
int i;
for (i=0;i<n;i++) {
c[i] = a[i]+b[i];
}
return 0;
}

int main()
{
func(a1,b1,c1,996);
return 1;
}

char arr[3000]={1};
char *a1 = arr;
char *b1=arr+1000;
char *c1=arr+2000;

The compiler emits several "jae" instructions when "ja" is correct. Note that the properly vectorized code is run if the buffers are n+1 bytes apart. Also, compiler switches /Qalias-const or /Qalias-args- have no effect.

Root Cause : 
Try changing the array size to 1001 and keep the trip count to 1000. You should see good perf.
This problem is due to multiversioning runtime check having “>1000”, not “>=1000” for the address difference runtime check.
It’s a bug related to the boundary conditions of multiversioning optimization.

Multiver is checking whether the address ranges overwrap, and that check is justone-byte too conservative (due to 1“>”1 versus 1“>=”1).

Resolution : 
The 11.0 latest, and 11.1 compilers have this fix.




This article applies to: Intel Software Network communities,   Intel® C++ Compiler for Windows* Knowledge Base,   Intel® Compilers