# Intel® IPP Library Signal Processing Domain Overview

By Paul Fischer,

Published:10/31/2011 Last Updated:10/31/2011

*If the images below are not loading, or are loading too slowly, please download and review this presentation in PDF format by following this link:*

*IPP-Signal-Processing-Overview-2010-q1.pdf*

*.*

An overview of the Intel® IPP Library signal processing domain, in slide format, follows.

**Notes for slide #2:**

**Notes for slide #4:**

FIR = Finite Impulse Response

IIR = Infinite Impulse Response

LMS = Least Means Square Adaptive FIR

FFT = Fast Fourier Transform

DFT = Discrete Fourier Transform

DCT = Discrete Cosine Transform

IIR = Infinite Impulse Response

LMS = Least Means Square Adaptive FIR

FFT = Fast Fourier Transform

DFT = Discrete Fourier Transform

DCT = Discrete Cosine Transform

**Notes for slide #12:**

This function converts a real input and produces an unabbreviated complex output. Please see the referenced pages for additional information regarding the purpose of the function.

FFTInitAlloc_* allocates and initializes an FFT “specification structure” to contain the tables needed for executing an optimized Fourier transform and sets the variable pFFTSpec to point to this “spec” structure.

- The memory containing these tables is dynamically allocated and must be freed by ippsFFTFree-R_32f.

- There are numerous flavors of the initialization function. This version deals with FFTs whose input array is real (hence the “_R” and 32-bit floating point (“_32f”). For each there is a corresponding function to release the memory.

- The same “spec” structure is used for both forward and inverse transforms.

- The variable “order” is the log-base2 of the signal length. FFT in IPP is defined only for powers of two, so this order is an integer. The DFT function group (shown in a later slide) handles lengths that are not powers of two.

- Since the order of the data and the operation is embedded in the pFFTSpec structure, it will not later be passed into the function ippsFFTFwd or ippsFFTInv.

FFTInitAlloc_* allocates and initializes an FFT “specification structure” to contain the tables needed for executing an optimized Fourier transform and sets the variable pFFTSpec to point to this “spec” structure.

- The memory containing these tables is dynamically allocated and must be freed by ippsFFTFree-R_32f.

- There are numerous flavors of the initialization function. This version deals with FFTs whose input array is real (hence the “_R” and 32-bit floating point (“_32f”). For each there is a corresponding function to release the memory.

- The same “spec” structure is used for both forward and inverse transforms.

- The variable “order” is the log-base2 of the signal length. FFT in IPP is defined only for powers of two, so this order is an integer. The DFT function group (shown in a later slide) handles lengths that are not powers of two.

- Since the order of the data and the operation is embedded in the pFFTSpec structure, it will not later be passed into the function ippsFFTFwd or ippsFFTInv.

- The third argument determines how the constant factor will be handled. Without a constant factor, FFTInv(FFTFwd(x)) will be N*x. IPP_FFT_DIV_INV_BY_N indicates that the constant factor will be divided out on the inverse FFT.

- The last argument advises the library whether to use the fastest, most accurate, or best overall version. The effect of this flag is platform-dependent.

A forward FFT is performed by ippsFFTFwd_RToCCS_32f().

- For simplicity, 0 is passed in as the temporary buffer space, which tells the function to allocate the buffer (a NULL pointer should be used). Optimized code should allocate a single buffer of a size defined by ippsFFTGetBufSize() and keep that buffer and the FFTSpec structure through multiple calls to the FFT function, in order to avoid multiple memory allocations which are very expensive calls. In this example the buffer is allocated and freed by FFTFwd and the “spec” structure is allocated and freed by myFFT_RT0C().

- The last argument advises the library whether to use the fastest, most accurate, or best overall version. The effect of this flag is platform-dependent.

A forward FFT is performed by ippsFFTFwd_RToCCS_32f().

- For simplicity, 0 is passed in as the temporary buffer space, which tells the function to allocate the buffer (a NULL pointer should be used). Optimized code should allocate a single buffer of a size defined by ippsFFTGetBufSize() and keep that buffer and the FFTSpec structure through multiple calls to the FFT function, in order to avoid multiple memory allocations which are very expensive calls. In this example the buffer is allocated and freed by FFTFwd and the “spec” structure is allocated and freed by myFFT_RT0C().

**Notes for slide #13:**

Note the change in the ippsDFTInitAlloc() parameter list of “order” to “len.” Other than FFT to DFT name changes, this example is a duplicate of the FFT example. The key difference between FFT and DFT is that DFT functions can operate on any size data set, not just powers of two.

Also, DFT functions will use the FFT implementation if the signal length is a power of two. The only advantage to using the FFT implementation directly is the removal of a single branch to accommodate that one optimization.

If code size is importance the FFT functions are smaller.

The ippg domain (“gen” domain) overlaps many of these functions with “machine built unrolled” functions. These implementations are generally the fastest at the expense of substantially larger code size.

- For simplicity, 0 is passed in as the temporary buffer space, which tells ippsDFTFwd*() to allocate the buffer (a NULL pointer should be used). Optimized code should allocate a single buffer of a size defined by ippsDFTGetBufSize() and keep that buffer and the DFTSpec structure through multiple calls to the DFT function, in order to avoid multiple memory allocations which are very expensive. In this example the buffer is allocated and freed by DFTFwd and the “spec” structure is allocated and freed by myDFT_RT0C().

Also, DFT functions will use the FFT implementation if the signal length is a power of two. The only advantage to using the FFT implementation directly is the removal of a single branch to accommodate that one optimization.

If code size is importance the FFT functions are smaller.

The ippg domain (“gen” domain) overlaps many of these functions with “machine built unrolled” functions. These implementations are generally the fastest at the expense of substantially larger code size.

- For simplicity, 0 is passed in as the temporary buffer space, which tells ippsDFTFwd*() to allocate the buffer (a NULL pointer should be used). Optimized code should allocate a single buffer of a size defined by ippsDFTGetBufSize() and keep that buffer and the DFTSpec structure through multiple calls to the DFT function, in order to avoid multiple memory allocations which are very expensive. In this example the buffer is allocated and freed by DFTFwd and the “spec” structure is allocated and freed by myDFT_RT0C().

**Notes for slide #17:**

Functions with the “Direct” suffix exist historically for compatibility with an older IPP library for xScale – FIRs and IIRs without this suffix are significantly faster and preferred.

**Notes for slide #18:**

Functions with the “Direct” suffix exist historically for compatibility with an older IPP library for xScale – FIRs and IIRs without this suffix are significantly faster and preferred.

**Notes for slide #19:**

This example implements a filter described by the tap coefficients { 0.25, 0.5, 0.25 } using Intel IPP FIR functions.

- “taps” defines the tap coefficients to be used in the FIR filter.

- “delayLine” represents the inputs, or historical samples.

- ippsFIRInitAlloc() initializes the filter.

- ippsFIR() performs the actual filtering on the input data.

ippsFIR() performs “len” iterations of filtering of the source and places the results in the destination. During each iteration, one value is taken from pSrc[n] and placed in the delay line. Then the dot product of the filter and the delay line is taken and the result written to pDst[n]. pFIRState holds the last “tapslen” samples in its internal delay line (in this case three samples).

- “taps” defines the tap coefficients to be used in the FIR filter.

- “delayLine” represents the inputs, or historical samples.

- ippsFIRInitAlloc() initializes the filter.

- ippsFIR() performs the actual filtering on the input data.

ippsFIR() performs “len” iterations of filtering of the source and places the results in the destination. During each iteration, one value is taken from pSrc[n] and placed in the delay line. Then the dot product of the filter and the delay line is taken and the result written to pDst[n]. pFIRState holds the last “tapslen” samples in its internal delay line (in this case three samples).

**Notes for slide #20:**

**Notes for slide #22:**

software.intel.com/en-us/articles/intel-ipp-kb/all/1/

software.intel.com/en-us/articles/intel-integrated-performance-primitives-documentation/

software.intel.com/en-us/articles/intel-integrated-performance-primitives-documentation/

Attachment | Size |
---|---|

ipp-signal-processing-overview-2010-q1.pdf | 0 |

^{1}

#### Product and Performance Information

^{1}

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.