Developer Guide and Reference

Contents

Capabilities of C++ SIMD Classes

The fundamental capabilities of each C++ SIMD class include:
  • computation
  • horizontal data support
  • branch compression/elimination
  • caching hints
Understanding each of these capabilities and how they interact is crucial to achieving desired results.

Computation

The SIMD C++ classes contain vertical operator support for most arithmetic operations, including shifting and saturation.
Computation operations include:
+
,
-
,
*
,
/
, reciprocal (
rcp
and
rcp_nr
), square root (
sqrt
), and reciprocal square root (
rsqrt
and
rsqrt_nr
).
Operations
rcp
and
rsqrt
are approximating instructions with very short latencies that produce results with at least 12 bits of accuracy. You may get a different answer if used on non-Intel processors. Operations
rcp_nr
and
rsqrt_nr
use software refining techniques to enhance the accuracy of the approximations, with a minimal impact on performance. (The "
nr
" stands for Newton-Raphson, a mathematical technique for improving performance using an approximate result.)

Horizontal Data Support

The C++ SIMD classes provide horizontal support for some arithmetic operations. The term "horizontal" indicates computation across the elements of one vector, as opposed to the vertical, element-by-element operations on two different vectors.
The
add_horizontal
,
unpack_low
and
pack_sat
functions are examples of horizontal data support. This support enables certain algorithms that cannot exploit the full potential of SIMD instructions.
Shuffle intrinsics are another example of horizontal data flow. Shuffle intrinsics are not expressed in the C++ classes due to their immediate arguments. However, the C++ class implementation enables you to mix shuffle intrinsics with the other C++ functions. For example:
F32vec4 fveca, fvecb, fvecd; fveca += fvecb; fvecd = _mm_shuffle_ps(fveca,fvecb,0);

Branch Compression/Elimination

Branching in SIMD architectures can be complicated and expensive. The SIMD C++ classes provide functions to eliminate branches, using logical operations, max and min functions, conditional selects, and compares. Consider the following example:
short a[4], b[4], c[4]; for (i=0; i<4; i++) c[i] = a[i] > b[i] ? a[i] : b[i];
This operation is independent of the value of
i
. For each
i
, the result could be either
A
or
B
depending on the actual values. A simple way of removing the branch altogether is to use the
select_gt
function, as follows:
Is16vec4 a, b, c c = select_gt(a, b, a, b)

Caching Hints

Intel® Streaming SIMD Extensions provide prefetching and streaming hints. Prefetching data can minimize the effects of memory latency. Streaming hints allow you to indicate that certain data should not be cached.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804