How to Vectorize Code Using Intrinsics on 32-Bit Intel® Architecture


Challenge

Vectorize code by means of intrinsics. Intrinsics provide the access to the ISA functionality using C/C++ style coding instead of assembly language. Consider the following simple loop:

void add(float *a, float *b, float *c)

{

int i;

for (i = 0; i < 4; i++) {

c[i] = a[i] + b[i];

}

}

 


Solution

Define intrinsics in code using the xmmintrin.h header file and use intrinsics data types to approximate the performance gains associated with hand-coding assembly. Intel has defined three sets of intrinsic functions that are implemented in the Intel® C++ Compiler to support MMX™ technology, Streaming SIMD Extensions and Streaming SIMD Extensions 2. Four new C data types, representing 64-bit and 128-bit objects, are used as the operands of these intrinsic functions. __m64 is used for MMX integer SIMD, __m128 is used for single-precision floating-point SIMD, __m128i is used for Streaming SIMD Extensions 2 integer SIMD and __m128d is used for double precision floating-point SIMD.

These types enable the programmer to choose the implementation of an algorithm directly, while allowing the compiler to perform register allocation and instruction scheduling where possible. These intrinsics are portable among all Intel® architecture-based processors supported by a compiler. The use of intrinsics allows you to obtain performance close to the levels achievable with assembly, while the cost of writing and maintaining programs with intrinsics is considerably less. For a detailed description of the intrinsics and their use, refer to the Intel C++ Compiler User’s Guide.

The following sample code shows the loop from the Challenge section vectorized using intrinsics:

#include "xmmintrin.h"

void add(float *a, float *b, float *c)

{

__m128 t0, t1;

t0 = _mm_load_ps(a);

t1 = _mm_load_ps(b);

t0 = _mm_add_ps(t0, t1);

_mm_store_ps(c, t0);

}

 

The intrinsics map one-to-one with actual Streaming SIMD Extensions assembly code. The xmmintrin.h header file in which the prototypes for the intrinsics are defined is part of the Intel C++ Compiler included with the VTune™ Performance Enhancement Environment CD. Intrinsics are also defined for the MMX technology ISA. These are based on the __m64 data type to represent the contents of an mm register. You can specify values in bytes, short integers, 32-bit values, or as 64-bit objects.

The intrinsic data types, however, are not a basic ANSI C data type, and therefore you must observe the following usage restrictions:

  • Use intrinsic data types only on the left-hand side of an assignment as a return value or as a parameter. You cannot use it with other arithmetic expressions (e.g., “+”, “>>”).
  • Use intrinsic data type objects in aggregates, such as unions to access the byte elements and structures; the address of an __m64 object may be also used.

 

This item is part of a series of items about coding techniques for vectorization.


Source

IA-32 Intel® Architecture Optimization Reference Manual

 


For more complete information about compiler optimizations, see our Optimization Notice.

2 comments

Top
anonymous's picture

Deep wizardry...

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.