# Problem: Intel C++ Compiler vectorisation with SSE2.

## Problem: Intel C++ Compiler vectorisation with SSE2.

Hi,

I am trying to compile some code with Intel C++ compiler V8.0 using the -xN option to vectorise loops. In a
simple loop, I have only one line of code, the loop is like this:

for (i=0; i -lt N; j++) { // Note: -lt is the less than sign
for (j=0; j -lt N; i++) {
// a is a 2D array of structures, d is an 1D array.
c += a[i][j].w * b[a[i][j].d]; // (1)
}
}

The compiler reported "remark: loop was not vectorized: dereference too complex." on the line (1). Anyone
knows why? How can I fix this?

Regards,
Will

11 posts / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.

Will,

Can you repost with how you declare the various variables (floats, ints?)? Can you verify the iteration variables use (you seem to have a switched i and j in the for statements)?

Thanks,

Max

Thank, here is the the thing:

double c;
//a[i][j].w is a double
//a[i][j].d is an int
for (i=0; i -lt N; j++) { // Note: -lt is the less than sign
for (j=0; j -lt N; i++) {
// a is a 2D array of structures, d is an 1D array.
c += a[i][j].w * b[a[i][j].d]; // (1)
}
}

Is this because of the b[a[i][j].d] dereference? value of a[i][j].d cannot be determined at compile time?

That would be my guess, but I'd like to try it out.

Can you repost with the explicit declarations of a & b? Also, can you confirm the loop indices? Are you really doing a for (i=0;ij++) and a for (j=0;ji++)?

Thanks, Max

Sorry Max, I did make a mistake, it should be like this:

struct MatrixElement {
double w;
int d;
}

struct MatrixElement** a;
double* b;
double c;

for (i=0; i -lt N; i++) { // Note: -lt is the less than sign
for (j=0; j -lt N; j++) {
// a is a 2D array of structures, d is an 1D array.
c += a[i][j].w * b[a[i][j].d]; // (1)
}
}

Thanks again,

Will

Neither of your array operands is suitable for vectorization. You need a contiguous array of a single data type. That's a basic hardware limitation, to allow the use of SSE2 parallel memory reference.

Thanks, so you mean that vectorisation can only be done on basic types? If the matrix were stored in a 2D arry, there would have been no problem with vectorisation?

Will

Yes, the compiler should easily vectorize arrays stored contiguously within a struct. Unfortunately, SSE vectorization doesn't work well with indexed arrays. Some applications copy sparse matrixdata into cached working blocks, where they can be vectorized easily.

Thanks, that should answer my question. Can you tell me a bit more about cached working blocks if you don't mind, please?

Regards,
Will

If you will be performing a number of operations on these vectors, it may be worth while to "gather" sections of them into contiguous working arrays, sized to fit cache, 16-byte aligned. The gather operation is not SSE-vectorizable, but the bulk of operations will have the advantage of vectorization and good cache behavior. I have found the optimum array lengths for this purpose on Xeon are multiples of 24.

Thanks, I'll look in to that for more.

Will