Problem version:
Intel® C++ 11.0.066
Environment :
Windows* IA-32
Problem:
The following routine does not run SSE2 code at all if the incoming buffers (a,b,c) differ by exactly n bytes:
int func(const char *const a, const char *const b, char *c, int n)
{
int i;
for (i=0;i<n;i++) {
c[i] = a[i]+b[i];
}
return 0;
}
int main()
{
func(a1,b1,c1,996);
return 1;
}
char arr[3000]={1};
char *a1 = arr;
char *b1=arr+1000;
char *c1=arr+2000;
The compiler emits several "jae" instructions when "ja" is correct. Note that the properly vectorized code is run if the buffers are n+1 bytes apart. Also, compiler switches /Qalias-const or /Qalias-args- have no effect.
Root Cause :
Try changing the array size to 1001 and keep the trip count to 1000. and you may see improved perfomance.
This problem is due to multiversioning runtime check having “>1000”, not “>=1000” for the address difference runtime check.
It’s a bug related to the boundary conditions of multiversioning optimization.
Multiver is checking whether the address ranges overwrap, and that check is justone-byte too conservative (due to 1“>”1 versus 1“>=”1).
Resolution :
The 11.0 latest, and 11.1 compilers have this fix.
Intel® C++ 11.0.066
Environment :
Windows* IA-32
Problem:
The following routine does not run SSE2 code at all if the incoming buffers (a,b,c) differ by exactly n bytes:
int func(const char *const a, const char *const b, char *c, int n)
{
int i;
for (i=0;i<n;i++) {
c[i] = a[i]+b[i];
}
return 0;
}
int main()
{
func(a1,b1,c1,996);
return 1;
}
char arr[3000]={1};
char *a1 = arr;
char *b1=arr+1000;
char *c1=arr+2000;
The compiler emits several "jae" instructions when "ja" is correct. Note that the properly vectorized code is run if the buffers are n+1 bytes apart. Also, compiler switches /Qalias-const or /Qalias-args- have no effect.
Root Cause :
Try changing the array size to 1001 and keep the trip count to 1000. and you may see improved perfomance.
This problem is due to multiversioning runtime check having “>1000”, not “>=1000” for the address difference runtime check.
It’s a bug related to the boundary conditions of multiversioning optimization.
Multiver is checking whether the address ranges overwrap, and that check is justone-byte too conservative (due to 1“>”1 versus 1“>=”1).
Resolution :
The 11.0 latest, and 11.1 compilers have this fix.

