| December 9, 2008 11:00 PM PST | |
Vectorize code by hand-coding in assembly. Programming directly in assembly language for a target platform may produce the required performance gain, but assembly code is not portable between processor architectures and is expensive to write and maintain.
Consider the following simple loop:
void add(float *a, float *b, float *c) |
Code key loops directly in assembly language using an assembler or by using inlined assembly (C-asm) in C/C++ code. The Intel® Compiler or assembler recognize the new instructions and registers, then directly generate the corresponding code. This model offers the opportunity for attaining greatest performance, but this performance is not portable across the different processor architectures.
The following code example shows the Streaming SIMD Extensions inlined-assembly encoding that corresponds to the code in the Challenge section:
void add(float *a, float *b, float *c) |
This item is part of a series of items about coding techniques for vectorization.
IA-32 Intel® Architecture Optimization Reference Manual
For more complete information about compiler optimizations, see our Optimization Notice.

