The Streaming SIMD Extensions 2 (SSE2) technology introduces new Single Instruction Multiple Data (SIMD) double-precision floating-point instructions and new SIMD integer instructions into the IA-32 Intel® architecture. The double-precision SIMD instructions extend functionality in a manner analogous to the single-precision instructions introduced with the Streaming SIMD Extensions (SSE). The 128-bit SIMD integer extensions are a full superset of the 64-bit integer SIMD instructions, with additional instructions to support more integer data types, conversion between integer and floating-point data types, and efficient operations between the caches and system memory. These instructions provide a means to accelerate operations typical of 3D graphics, real-time physics, spatial (3D) audio, video encoding/decoding, encryption, and scientific application. The 128-bit integer SIMD extensions in SSE2 technology can process data 128 bits at a time using the XMM registers, enabling the implementation of important algorithms, such as the Hidden Markov Model, to be improved further than previous implementations using MMX™ technology and SSE. This application note (AP-946) contains both the code and a description of how the SSE2 instructions can be used to implement a Veterbi algorithm to evaluate a Hidden Markov Model.
Application note AP-569, entitled Using MMX™ Instructions to Implement Viterbi Decoding, describes how to use the MMX instructions to gain a 2x improvement over scalar code. Another application note, AP-811, entitled Using the Streaming SIMD Extensions to Evaluate a Hidden Markov Model with Viterbi Decoding, shows how using the SSE instructions and operation on four data elements at a time (increasing the SIMD width by two) can further increase the performance gain. This application note will describe how the SSE2 instructions provide a significant performance gain when compared to the implementation that uses the SSE instructions.
Download Code Samples
Download KernelTemplate.zip. This library is required to compile the application.