MMX™ Technology Manuals and Application Notes

Introduction

Intel's MMX™ technology is designed to accelerate multimedia and communications applications. The technology includes new instructions and data types that allow applications to achieve a new level of performance. It exploits the parallelism inherent in many multimedia and communications algorithms, yet maintains full compatibility with existing operating systems and applications.

The next two pages contain our full suite of MMX reference material:

  • Technical Overview
  • Developer Guide
  • Application Notes

All documents are in PDF format.

Manuals

All documents are in PDF format.

Technical Overview of MMX™ Technology
Intel's MMX technology is designed to accelerate multimedia and communications applications. The technology includes new instructions and data types that allow applications to achieve a new level of performance. It exploits the parallelism inherent in many multimedia and communications algorithms, yet maintains full compatibility with existing operating systems and applications.

MMX Technology Developers Guide
Intel's MMX technology is an extension to the Intel Architecture (IA) instruction set. The technology uses a single instruction, multiple data (SIMD) technique to speedup multimedia and communications software by processing multiple data elements in parallel. The MMX instruction set adds 57 new opcodes and a new 64-bit quadword data type.


Application Notes

Audio Communications DSP Kernels
Graphics 2D Graphics 3D Image Effects
Miscellaneous Speech Recognition Video


All documents are in PDF format.

Audio

Audio Echo Effects
Presents examples of code that exploit these SIMD instructions to add echo to existing audio data.

MPEG1 Audio Kernels
This document describes the Synthesis Sub-band Filter algorithm used in the MPEG audio decoder and its implementation with the MMX™ technology.

G.728 Code Book Search
G.728 is an algorithm for coding/decoding speech signals at 16 kbit/s using Low-Delay Code Excited Linear Prediction. This application note presents examples of MMX instructions use to implement the Codebook Search module in the G.728 algorithm. This module receives an input vector and searches through a VQ (Vector Quantization) codebook to identify the closest match.

Levinson-Durbin Filter
The amount of data which represents a human voice or sound is most often too large to store on a typical PC. Therefore, encoding sound and only storing a partial set of the data would be more practical. Voice encoding is one of the applications in which the Levinson-Durbin algorithm issued. The application note presented here illustrates how to use the new MMX technology to perform matrix multiplication more efficiently.

Schur Weiner Filter
The amount of data which represents a human voice or sound is most often too large to store on a typical PC. Voice encoding is one of the applications in which the Schur Algorithm is used. This application note presents examples of code that exploit the MMX technology single-instruction, multiple data (SIMD) instructions to implement the Shur Algorithm.

Communications

Passband Echo Canceller
This application note presents an implementation of a common modem algorithm that takes advantage of the new Intel Architecture media extensions. The passband echo canceler is an adaptive filter that effectively cancels out the near and far end echoes allowing the transmitted signal from the remote modem to arrive more cleanly at the receiver.

Baseband Echo Cancellation
There are two sources of echo in a modem. The near end (NE) echo signal and the far echo signal. This application note presents an implementation of a common modem algorithm that takes advantages of the new MMX instruction, specifically the base band echo canceler. The base band echo canceler is an adaptive filter that effectively cancels out the near and far end echo allowing the transmitted signal from t he remote modem to arrive more cleanly at the receiver.

1/3 T Equalizer
One goal of a data communication system is to realize a maximum likelihood that a transmitted data sequence is received in the same way it is transmitted. An equalizer is used at the receiver end of a system to counteract the non-ideal aspects of the communication system.

2/3 T Spaced Equalizer
This document presents a code example using MMX instruction to implement an adaptive filter,specifically the 2/3 T Spaced Equalizer algorithm on complex arithmetic data.

DSP Kernels

Efficient Vector/Matrix Multiply Routine
This application note demonstrates significant speed up of a vector dot product and a Matrix Multiply routine. It also shows, how loop unrolling can be used to optimize the performance of MMX technology based code.

Matrix Transpose
This application note demonstrates two approaches to transposing a matrix using MMX technology based code.

Real 16-bit FFT
This document describes an implementation of a 16-bit, complex Fast Fourier Transform (FFT) procedure using MMX instructions. An FFT provides a fast algorithm for transforming discrete data from the time domain to the frequency domain. This algorithm has a wide range of applications in the signal processing world, including V.34 modem data pump.

Dot Product - 16x16 -> 32
Calculating the dot-product of two vectors requires executing a large number of multiply-accumulate operations. This application note shows how to use the MMX instructions to significantly speed up 16-bit vector dot-product calculation.

Real FIR - 16 bit
FIR (Finite Impulse Response) are filtering functions that operate on complex numbers. These functions are frequently found in digital signal processing applications. Modem applications typically make heavy use of such functions. This application note shows how to use the MMX instructions to significantly speed up computation of 16 bit FIR digital filters.

Vector Arithmetic and Logic Operations
This application note presents two examples which demonstrate the use of MMX technology (SIMD) instructions to perform basic arithmetic and logic operations on vectors of numbers.

High Precision Multiply
MMX technology instructions include support for very fast 16-bit by 16-bit multiplication, returning 32-bit r esults. This document describes an implementation of a 16-bit by 31-bit multiplication operation using the MMX instruction set.

Data Alignment
This paper describes several simple techniques to guarantee data alignment for Assembly, C, C++, or Microsoft Windows with currently available software technology and tools.

Graphics (2D)

Fractals with MMX Technology
Illustrates how to use the MMX technology to achieve better performance in generating Mandelbrot Set fractals.

Sprite Overlay
Sprites are computer characters which generally appear in the foreground.They are implemented by overlaying small sprite images on large background image. This application notes describe a Sprite Overlay function that uses MMX instruction set and is called by a sprite engine to control sprites.

Graphics (3D)

Advanced Procedural Texturing using MMX Technology
This application note and code samples show how MMX Technology-based software procedural texturing can be used for real-time 3D graphics, in the Microsoft DirectDraw framework. We generate a variety of natural-looking patterns, such as water, stars, grass, wood, and marble, using a mathematical technique called fractional Brownian motion. Procedural texturing requires much less bandwidth than the traditional image-mapping implemented in hardware accelerators. Two methods for Z-buffering in the procedural textures are implemented and compared. The Z-Integration technique gives a standard MMX Technology template to be inserted into a scanline algorithm. The second algorithm, while slower, works with all possible scanline rasterizers.

Using MMX Instructions for Procedural Texture Mapping
For 3D graphics, texture mapping adds realism to a scene. Texture mapping involves "wrapping" a 2D image around a computer-generated 3D object, to give the appearance that the object is composed of a particular material, or is far more complex than the underlying geometric description. Currently, two types of texture mapping exist. One is image texture mapping and the other is procedural texture mapping.

AGP and 3D Graphics Software
This paper shows how to use AGP and explains the infrastructures in hardware, OS (Operating System), and the API (Application Programming Interface) that support it.

MMX Technology for 3D Rendering
Intel's MMX Technology processes multiple integer data items in parallel, 64 bits at a time. It can speed processing of pixels in 3D graphics, compared to straight Intel Architecture code which handles at most 32 bits at a time. Thus MMX technology may enable higher frame rates and/or higher quality images.

3D Bilinear Texture Mapping
An optimized MMX technology texture mapping algorithm, using bilinear interpolation as a filter and quadratic approximation for perspective correction.

Gouraud
Gouraud shading is a scan line algorithm used to render objects smoothly in 3D graphics. If a scan line algorithm is used to render an object, a value for the intensity of each pixel along the scan line must be determined from the illumination model. This application note illustrates how to use MMX technology to achieve better performance in color rendering.

3D Transform
This paper describes the use of the MMX technology to accelerate one aspect of computer graphics: 3D geometry.

Image Effects

YUV12 to RGB Color Conversion
Describes the usage of the new Intel MMX instruction set to implement Color Conversion Kernels (CCK) from YUV12 to RGB color space. The MMX instructions are Intel's implementation of Single Instruction Multiple Data (SIMD) instructions.

2X 8-bit Image Scaling
This application note exploits the SIMD instructions to implement the 2X image scaling algorithm.

Bilinear Interpolation
A common technique used for 3D rendering is to decompose the surface of objects into a large number of nearly planar triangles or rectangles. Their position in a 3-dimensional space is then mapped to the 2-dimensional display surface, and the individual triangles or rectangles are drawn one pixel at a time. One technique for increasing the realism of drawn images is commonly known as texture mapping. This application note deals with one aspect of this process: accurately determining the color to display at a single pixel on the display surface.

Median Filter
For images that have random noise, a Median Filter will effectively get rid of much of the noise without causing the blurring typical of a linear low pass filter. The Median Filter described in this paper takes the value of a non-edge pixel, and the eight pixels around it - left, right, upper left, upper right, upper center, lower left, lower right, lower center. Of these nine values, the median (not an average) value is used for the resulting image. For the edges, the existing pixel value is used without change.

Row Filter- 8 bit
This document presents examples of code that illustrate how to use the new MMX technology unpack, multiply, and add instructions to apply a filter across the rows of a graphical or video bitmap image.

Column Filter
This application note provides an example of an MMX technology optimized column filter implementation. The column filter described here performs multiplication of each 32-bit pixel value in a bitmap by the appropriate filter coefficient, and accumulates seven such multiplication to obtain the final pixel value.

Alpha Blending
Alpha blending is used for imaging effects; to merge two images together,weighting one image more than the other. Alpha blending can be used for fading from one image to another, or creating a translucent effect. This application note presents a code example that implements an alpha blending filter used for imaging effects.

24 to 16 Bit Conversion
The MMX technology instruction set includes single-instruction, multi-data (SIMD) instructions. This application note presents example code that exploit these instructions. Two RGB24to16 functions are examined that use MMX instruction set to complete the conversion.

RGB > YUV
Color spaces are 3-D coordinate systems in which each color is represented by a single point. Colors appear as their primary components red, green and blue, in the RBG color space. RGB is the format generally used by monitors. Each color appears as a luminance component, Y, and 2 chrominance components, U and V, in the YUV space. This document shows how MMX technology instructions can significantly speed up RGB to YUV color conversion.

Miscellaneous

New EMMS Usage Guidelines
Guidelines for EMMS usage have been changed due to significant side effect delay.

Survey of Pentium® Processor Performance Monitoring Capabilities & Tools
Performance monitoring methods & tools for the Pentium® processor are discussed.

How to Use Floating-Point or MMX Technology in Ring 0 or a VxD under Windows* 95
The current release of Windows 95 4.00 does not allow floating-point or MMX instructions within VxD's, which run in ring 0. Floating point and MMX instructions in applications and DLL's are not restricted. The reason for the restriction is because Windows 95 does not allow floating-point exceptions when they are originated from ring 0.

Speech Recognition

Viterbi Decoding
Viterbi decoding is a Dynamic Programming (DP) algorithm that among other applications,is used in evaluating Hidden Markov Models. This application note prese nts a code example that implements the Viterbi decoding algorithm.

L1 Distance Measure
This document presents an MMX instruction implementation of a L1 distance measure. L1 distance measure is also known as "sum of absolute differences."

L2 Norm Distance Measure
This document presents a code example that uses MMX instruction set to implement a 16-bit L2 norm function. L2 norm function,also known as the Euclidean distance, is often use as a distance measure to extract degrees of similarity between two vectors in speech compression.

Video

IDCT 2D 8x8
This document describes an implementation of a two-dimensional Inverse Discrete Cosine Transform (IDCT) using MMX instructions. This transformation is widely used in image compression algorithms, most notably, the JPEG and MPEG standards.

Motion Compensation
This application note presents examples of MMX instruction set use to perform Motion Compensation (MC) for MPEG1 Video playback and specifically, techniques used for avoiding misalignment problem. Motion compensation is an operation where the motion vector is use to reconstruct the predicted block.

Absolute Difference
This document describes an MMX technology implementation of a procedure to perform an absolute difference on a 16x16 block of pixels. This procedure can be an integral part of a motion estimation kernel. Motion estimation is a technique used in video compression to try and predict movement between consecutive frames.

Haar Transform - 2x2
The 2x2 Haar transform is used to decompose an image into 4 bands whose spatial frequencies and information contents differ. These differences allows sub-band compression methods to control the bit rate by quantizing bands differently and to control the decode time by removing one or more bands from the bit stream. 2x2 Haar transform is computed by adding and subtracting adjacent image or array elements. This application note describe the MMX instruction set use to calculate the Haar transform.

Get Bits
This application note presents examples of instructions such as MMX technology shift instructions to manipulate a data stream. The performance improvement relative to Intel Architecture (IA) code can be attributed primarily to much faster shift instructions, and also due to the fact that MMX technology shifts instruction operate on 64 -bit values instead of 32-bit values shift instruction.

Video Loop Filter
Filtering or smoothing operations are used to re duce noise in imagery that is often characterized by high frequency components. This application note presents the basics of a loop filter implementation using MMX instructions.


For more complete information about compiler optimizations, see our Optimization Notice.