Developer Guide and Reference

Contents

Capabilities of C++ SIMD Classes

The fundamental capabilities of each C++ SIMD class include:
  • computation
  • horizontal data support
  • branch compression/elimination
  • caching hints
Understanding each of these capabilities and how they interact is crucial to achieving desired results.

Computation

The SIMD C++ classes contain vertical operator support for most arithmetic operations, including shifting and saturation.
Computation operations include:
+
,
-
,
*
,
/
, reciprocal (
rcp
and
rcp_nr
), square root (
sqrt
), and reciprocal square root (
rsqrt
and
rsqrt_nr
).
Operations
rcp
and
rsqrt
are approximating instructions with very short latencies that produce results with at least 12 bits of accuracy. You may get a different answer if used on non-Intel processors. Operations
rcp_nr
and
rsqrt_nr
use software refining techniques to enhance the accuracy of the approximations, with a minimal impact on performance. (The "
nr
" stands for Newton-Raphson, a mathematical technique for improving performance using an approximate result.)

Horizontal Data Support

The C++ SIMD classes provide horizontal support for some arithmetic operations. The term "horizontal" indicates computation across the elements of one vector, as opposed to the vertical, element-by-element operations on two different vectors.
The
add_horizontal
,
unpack_low
and
pack_sat
functions are examples of horizontal data support. This support enables certain algorithms that cannot exploit the full potential of SIMD instructions.
Shuffle intrinsics are another example of horizontal data flow. Shuffle intrinsics are not expressed in the C++ classes due to their immediate arguments. However, the C++ class implementation enables you to mix shuffle intrinsics with the other C++ functions. For example:
F32vec4 fveca, fvecb, fvecd; fveca += fvecb; fvecd = _mm_shuffle_ps(fveca,fvecb,0);

Branch Compression/Elimination

Branching in SIMD architectures can be complicated and expensive. The SIMD C++ classes provide functions to eliminate branches, using logical operations, max and min functions, conditional selects, and compares. Consider the following example:
short a[4], b[4], c[4]; for (i=0; i<4; i++) c[i] = a[i] > b[i] ? a[i] : b[i];</