Developing Intel® AVX Optimized Microsoft* Real-Time Audio (MSRTA) Codec using Intel® IPP

Download Article

Download Developing Intel® AVX Optimized Microsoft* Real-Time Audio (MSRTA) Codec using Intel® IPP [PDF 284KB]
 

Introduction

This white paper describes how Intel® Integrated Performance Primitives (Intel® IPP) can provide the building blocks to develop the Microsoft* Real Time Audio (MSRTA) codec on the latest Intel® microarchitecture, code name Sandy Bridge. MSRTA is specifically designed for real-time two-way Voice over IP (VoIP) applications. We developed a speech codec that is fully bitstream compliant with the Microsoft RTAudio codec with comparable quality. It has been implemented using Intel® IPP.

We describe how to use Intel® IPP to build the Intel® Advanced Vector Extensions (Intel® AVX) optimized Microsoft* Real Time Audio Codec for VoIP applications. We provide performance results for Intel® microarchitecture code name Sandy Bridge.
 

Unified Speech Component

The Unified Speech Component (USC) interface is a C language framework designed for implementation of speech codecs, echo cancellers, and other voice processing modules using the Intel® IPP library. Most of the speech codec standards mentioned in the above section utilize this extensible USC interface.

The purpose of the USC interface is to provide unified access to an algorithm module, the access being independent of the algorithm internals. The USC interface also enables binaries to be easily integrated into existing software applications. Decoupling the interface and the algorithm details enables making the development of system components independent of the algorithm implementation.

For more information, refer to the Unified Speech Component Interface manual (uscmanual.pdf) in the speech coding sample (part of the Intel® IPP main sample).
 

Intel® IPP functions optimized for Intel® Advanced Vector Extensions

Intel® Advanced Vector Extensions (Intel® AVX) is a 256-bit instruction set extension to Intel® Streaming SIMD Extensions (Intel® SSE), designed to provide even higher performance for applications that are floating-point intensive. Intel AVX adds new functionality to the existing Intel SIMD instruction set (based on Intel® SSE), and includes a more compact SIMD encoding format.

The Intel® IPP library has been optimized for a variety of SIMD instruction sets. Automatic "dispatching" detects the SIMD instruction set that is available on the running processor and selects the optimal SIMD instructions for that processor. Refer to Understanding CPU Dispatching in the Intel® IPP Library for more information regarding dispatching.

Intel® AVX optimization in the Intel® IPP library consists of "hand-optimized" and "compiler-tuned" functions-code that has been directly optimized for the Intel® AVX instruction set.

For more information on Intel® IPP functions optimized for Intel® AVX, refer to the article Intel® IPP Functions Optimized for Intel® AVX.

 

 

Microsoft* Real-Time Audio Codec (MSRTA)

RTAudio* is the preferred Microsoft® Real-Time audio codec, and is used by Microsoft Lync Server* (formerly Microsoft Office Communications Server*) and other communications applications like Microsoft Lync* (formerly Microsoft Office Communicator*) and Microsoft LiveMeeting* Console.

To get more information about the Microsoft RTAudio Codec, refer to Overview of the Microsoft RTAudio* Speech Codec.

 

 

 

 

Intel® IPP Real Time Audio functions

By combing Intel® IPP RT audio functions (refer to Appendix B), it is possible to construct a speech codec compliant with the Microsoft RTAudio* codec. The primitives are primarily designed to implement the well-defined, computationally expensive core operations that comprise the codec portion of the RTAudio system.

 

 

 

 

USC MSRTA Codec

The USC MSRTA codec supports 16-bit wideband 16000 Hz and narrowband 8000 Hz PCM mono signal compression and decompression, with 20ms frame lengths at bitrates 8800 bps and 18000 bps respectively.

To understand the usage of Intel® IPP in developing and measuring the USC MSRTA codec, please download the following free code samples: Code Samples for the Intel® Integrated Performance Primitives (Intel® IPP) Library

Extract all files in w_ipp -sample_*.zip to the desired folder. Make sure to preserve the directory structure. The files for the USC Speech Codec, USC Echo Cancellation, UMC Speech RTP codec, and USC Filter and tones samples will be found in the

../ ipp_samples/speech-codecs folder.

How to build the source code

 

 

 

  1. Set system environment variable IPPROOT
  2. Open solution/project file in related Microsoft* Visual Studio
  3. Select the configuration/platform you need
  4. Build all projects in Microsoft Visual Studio*
  5. Run the codec

To run the sample for encode or for decode, use the following command line:

usc_speech_rtp_codec.exe [options] <infile> <outfile>

Depending on which of the two formats-WAVE or RTPDump-has the input file <infile>, either an encode or a decode operation will be performed respectively. For an encode operation, the output file <outfile> is stored in RTPDump format; for decode, the ouput file is stored in WAVE format. Option list:

 

 

 

-format <codecname> - codec option
-r<bitrate> - bitrate option (mandatory)
-v -Voice Activity Detector(VAD) enabled. Default: VAD disabled

 

 

 

Codec name Supported bitrate, in bps Codec description
IPP_MSRTAnb_FP 8800 Narrowband 8000 KHz MSRTA codec, floating-point implementation
IPP_MSRTAwb_FP 18000 Wideband 16000 KHz MSRTA codec, floating-point implementation


To enable RT Audio codecs IPP_MSRTAnb_FP and IPP_MSRTAwb_FP, compile the sample source with _USC_MSRTA definition and link with the IPP RTA static library (ipp_rta.lib) provided in binary form.

For information on building the sample, see the readme file in the ipp_samples\ speech-codecs directory.

Refer to Appendix C for USC MSRTA Codec description


Performance of USC MSRTA Codec

 

 

 

System Configuration Intel® microarchitecture code name Nehalem (NHM) Intel® microarchitecture code name Sandy Bridge (SNB)
CPU Intel® Xeon® processor X5570 @ 2.93GHz Genuine Intel® CPU 0 @ 3.00GHz
Operating System Microsoft Windows* 2003 Microsoft Windows* x64 with SNB patch


We used Intel® IPP 7.0.2 to measure the performance of USC MSRTA codec on Intel® microarchitecture code name Sandy Bridge with frequency of 3.0 GHz and Intel® microarchitecture code name Nehalem at 2.93 GHz. Both systems had Microsoft Windows* 64-bit installed. To measure the performance of the codec, different data sets were used to satisfy the requirement of the MSRTA codec. The two different types of data streams were 16-bit with 16000 Hz wideband data stream (s_16000_16.wav) and 16-bit with 8000 Hz narrowband (s_8000_16.wav).

IA32:



Intel® 64:



As mentioned in the performance table above, encode and decode performances are measured in MHz. By multiplying the performance number in seconds by CPU frequency, you will get performance numbers in MHz. The duration of narrowband data is 1070 sec, and wideband is 1090 sec. To measure the performance improvement of the USC MSRTA codec on Intel® microarchitecture code name Sandy Bridge compared to Intel® microarchitecture code name Nehalem, divide the performance number of Intel microarchitecture code name Nehalem by the performance number of Intel microarchitecture code name Sandy Bridge.

For example, on 32-bit Intel microarchitecture code name Nehalem, narrowband encode performance of USC MSRTA without VAD is 20.42 sec. Performance of the same data set on Intel microarchitecture code name Sandy Bridge is 16.65 sec. To measure the speedup, divide 20.42 by 16.65 to get 1.23x. USC MSRTA decode performance exhibits similar benefits.

 

 

 

 

 

Summary

Intel® IPP is a highly optimized library for the latest Intel architecture, including Intel® microarchitecture code name Sandy Bridge. By using Intel IPP functions, Intel® AVX optimized audio and speech codecs can be developed, including the Microsoft RTAudio* (real time audio) codec. Intel IPP offers sample code to demonstrate the development and usage of the MSRTA codec.

 

 

 

 

 

 

Appendix A: Introduction to Intel® IPP

Intel® Integrated Performance Primitives Intel® Integrated Performance Primitives (Intel® IPP) is an extensive library of multi-core ready, highly optimized software functions for digital media and data processing applications. Intel® IPP offers thousands of optimized functions covering frequently-used fundamental algorithms. Intel® IPP functions are designed to deliver performance beyond what optimized compilers alone can deliver. Intel® IPP is a software library that offers various highly optimized functions including multimedia and speech codecs. For advanced performance and greater value, Intel® IPP is also available with Intel® Parallel Studio XE 2011. For more information, go to /en-us/articles/intel-ipp/.

 

 

 

 

 

 

Intel® IPP Speech Codec

Intel® IPP includes functions that can be used for implementing speech codecs. These codecs follow the International Telecommunication Union (ITU)* recommendations for G711 (companding functions), G.722, G.722.1, G.723.1, G726, G.728, G.729.1 and G.729 codecs, G.167, G.168 for Echo Canceller, G.169 for Audio Level control, European Telecommunications Standards Institute (ETSI)* specifications for GSM-AMR and GSM-FR codecs, as well as 3GPP* specification for AMRWB and AMRWB+ codecs, and Microsoft* Real-Time Audio codec.

Note: Implementations of these standards or the standard-enabled platforms may require licenses from various entities, including Intel Corporation.

 

 

 

 

 

 

Appendix B: Intel® AVX-optimized Intel® IPP RT Audio Functions

 

 

 

Function Base Name Operation
AdaptiveCodebookSearch_RTA Searches for the adaptive codebook index and the lag, and computes the adaptive vector
FixedCodebookSearch_RTA, FixedCodebookSearchRandom_RTA Searches for the fixed codebook vector
HighPassFilter_RTA Performs high-pass filtering
LSPQuant_RTA Performs quantization of LSP coefficients
LSPToLPC_RTA Converts LSP coefficients to LP coefficients
QMFDecode_RTA Performs QMF synthesis
PostFilter_RTA Restores speech signal from the residual


Other required functions for RT Audio functions are as below

 

 

Function Base Name Operation
LPCToLSP_RTA Converts LP coefficients to LSP coefficients
LevinsonDurbin_RTA Calculates LP coefficients from the autocorrelation coefficients
QMFGetStateSize_RTA Calculates the size of the QMF filter state memory
QMFInit_RTA Initializes the QMF filter state memory
QMFEncode_RTA Performs QMF analysis
PostFilterGetStateSize_RTA Calculates the size of the post filter state memory
PostFilterInit_RTA Initializes the post filter state memory
BandPassFilter_RTA Performs band pass filtering


*LSP - Line spectral pairs
*LP - Linear Prediction
*QMF- Quadrature mirror filter

 

 

 

 

 

 

Appendix C: USC MSRTA Codec Description

For a description of the USC Interface, refer to the USC manual in the sample directory. Note that in the description of the Codec API function Decode when out-> pBuffer is zero, fake decode is performed, with no PCM stream output setting. out-> nBytes is set to bitstream length bytes without FEC.

The USC MSRTA codec is supported with following parameters:

 

 

 

 

 

 

   
Codec names : IPP_MSRTAnb_FP
Compression algorithms : QMF, LPC, Adaptive and Fixed codebook for low band, Unvoiced Fixed codebook for high band, bandwidth control for variable bitrate mode
Linkage : USC_MSRTAFP_Fxns
Signal : 16bit linear PCM
Sampling : 8000
Frame : 20ms
Bitrates : 8800 bps (176 bpf, 22 bytes)
Voice Activity Detection : variable rate support
Packet Loss Concealment : PLC supported
Frame type (value) : default 0, not required
Standard : Microsoft Real-Time Audio.
Codec names : IPP_MSRTAwb_FP
Compression algorithms : QMF, LPC, Adaptive and Fixed codebook for low band, Unvoiced Fixed codebook for high band, bandwidth control for variable bitrate mode.
Linkage: : USC_MSRTAFP_Fxns
Signal: : 16bit linear PCM.
Sampling: : 16000 Hz.
Frame: : 20ms.
Bitrates: : 18000 bps (360 bpf, 45 bytes)
Voice Activity Detection : variable rate support
Packet Loss Concealment : PLC supported
Frame type (value): : default 0, not required
Standard: : Microsoft Real-Time Audio.

 

 

 

 

Additional Resources

 

 

About the Author

Naveen Gv is a Technical Consulting Engineer (TCE) in the performance library team. At Intel he has specialized on Multi-core programming, Intel Performance Primitives and Intel Math Kernel Library. His professional interests are teaching Multi core Programming Methodology to software community and implementing Digital Signal Processing algorithms on x86 platform. Naveen has worked with several universities across Asia Pacific to implement Multi-Core programming in academia. His e-mail address is naveen.gv at intel.com

For more complete information about compiler optimizations, see our Optimization Notice.