How to Vectorize Code Automatically on 32-Bit Intel® Architecture

Submit New Article

Last Modified On :   August 13, 2009 5:49 PM PDT
Rate
 



Challenge

Vectorize code automatically. Consider the following simple loop:

void add(float *a, float *b, float *c)
{
int i;
for (i = 0; i < 4; i++) {
c[i] = a[i] + b[i];
}
}

 

The Intel® C++ Compiler provides an optimization mechanism by which simple loops can be automatically vectorized, or converted into Streaming SIMD Extensions code. The compiler uses similar techniques to those used by a programmer to identify whether a loop is suitable for conversion to SIMD. This involves determining whether the following might prevent vectorization:

  • the layout of the loop and the data structures used
  • dependencies amongst the data accesses in each iteration and across iterations

 

Once the compiler has made that determination, it can generate vectorized code for the loop, allowing the application to use the SIMD instructions.

The caveat to this is that only certain types of loops can be automatically vectorized, and in most cases user interaction with the compiler is needed to fully enable it.


Solution

Modify the code to take advantage of this functionality, and then compile using the -Qax and -Qrestrict switches of the Intel® C++ Compiler, version 4.0 or later.

The following code sample shows the appropriate modification of the code in the Challenge section:

void add (float *restrict a,
float *restrict b,
float *restrict c)
{
int i;
for (i = 0; i < 4; i++) {
c[i] = a[i] + b[i];
}
}

 

The restrict qualifier in the argument list is necessary to let the compiler know that there are no other aliases to the memory to which the pointers point. In other words, the pointer for which it is used provides the only means of accessing the memory in question in the scope in which the pointers live. Without this qualifier, the compiler will not vectorize the loop, because it cannot ascertain whether the array references in the loop overlap, and without this information, generating vectorized code is unsafe.

Refer to the Intel® C++ Compiler User’s Guide for more details on the use of automatic vectorization.

This item is part of a series of items about coding techniques for vectorization.


Source

IA-32 Intel® Architecture Optimization Reference Manual