How to Vectorize Code Using C/C++ Classes on 32-Bit Intel® Architecture


Challenge

Vectorize code by means of C++ vector classes. Consider the following simple loop:

void add(float *a, float *b, float *c)

{

int i;

for (i = 0; i < 4; i++) {

c[i] = a[i] + b[i];

}

}

 


Solution

Use the set of C++ classes that has been defined and is available in the Intel® C++ Compiler to provide both a higher-level abstraction and more flexibility for programming with MMX™ technology, Streaming SIMD Extensions and Streaming SIMD Extensions 2. These classes provide an easy-to-use and flexible interface to the intrinsic functions, allowing developers to write more natural C++ code without worrying about which intrinsic or assembly-language instruction to use for a given operation. Since the intrinsic functions underlie the implementation of these C++ classes, the performance of applications using this methodology can approach that of one using the intrinsics. Further details on the use of these classes can be found in the Intel C++ Class Libraries for SIMD Operations User’s Guide, order number 693500.

The following sample code shows the C++ code using a vector class library that corresponds to the sample given in the Challenge section. The example assumes the arrays passed to the routine are already aligned to 16-byte boundaries.

#include "fvec.h"

void add(float *a, float *b, float *c)

{

F32vec4 *av=(F32vec4 *) a;

F32vec4 *bv=(F32vec4 *) b;

F32vec4 *cv=(F32vec4 *) c;

*cv=*av + *bv;

}

 

Here, fvec.h is the class definition file, and F32vec4 is the class representing an array of four floats. The “+” and “=” operators are overloaded so that the actual Streaming SIMD Extensions implementation in the previous example is abstracted out, or hidden, from the developer. Note how much this resembles the original code, allowing for simpler and faster programming.

The example assumes that the arrays passed to the routine are already aligned to 16-byte boundary.

This item is part of a series of items about coding techniques for vectorization.


Source

IA-32 Intel® Architecture Optimization Reference Manual

 


For more complete information about compiler optimizations, see our Optimization Notice.