vectorization failure

Submit New Article

November 23, 2009 12:00 AM PST



Problem version: 
Intel® C++ 11.0.066

Environment : 
Windows* IA-32

Problem:
The following routine does not run SSE2 code at all if the incoming buffers (a,b,c) differ by exactly n bytes:

int func(const char *const a, const char *const b, char *c, int n)
{
int i;
for (i=0;i<n;i++) {
c[i] = a[i]+b[i];
}
return 0;
}

int main()
{
func(a1,b1,c1,996);
return 1;
}

char arr[3000]={1};
char *a1 = arr;
char *b1=arr+1000;
char *c1=arr+2000;

The compiler emits several "jae" instructions when "ja" is correct. Note that the properly vectorized code is run if the buffers are n+1 bytes apart. Also, compiler switches /Qalias-const or /Qalias-args- have no effect.

Root Cause : 
Try changing the array size to 1001 and keep the trip count to 1000. and you may see improved perfomance.
This problem is due to multiversioning runtime check having “>1000”, not “>=1000” for the address difference runtime check.
It’s a bug related to the boundary conditions of multiversioning optimization.

Multiver is checking whether the address ranges overwrap, and that check is justone-byte too conservative (due to 1“>”1 versus 1“>=”1).

Resolution : 
The 11.0 latest, and 11.1 compilers have this fix.

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804



Do you need more help?


This article applies to: Intel Software Network communities,   Intel® C++ Compiler for Windows* Knowledge Base