This white paper describes how Intel® Integrated Performance Primitives (Intel® IPP) can provide the building blocks to develop the Microsoft* Real Time Audio (MSRTA) codec on the latest Intel® microarchitecture, code name Sandy Bridge. MSRTA is specifically designed for real-time two-way Voice over IP (VoIP) applications. We developed a speech codec that is fully bitstream compliant with the Microsoft RTAudio codec with comparable quality. It has been implemented using Intel® IPP.
We describe how to use Intel® IPP to build the Intel® Advanced Vector Extensions (Intel® AVX) optimized Microsoft* Real Time Audio Codec for VoIP applications. We provide performance results for Intel® microarchitecture code name Sandy Bridge.
Unified Speech Component
The Unified Speech Component (USC) interface is a C language framework designed for implementation of speech codecs, echo cancellers, and other voice processing modules using the Intel® IPP library. Most of the speech codec standards mentioned in the above section utilize this extensible USC interface.
The purpose of the USC interface is to provide unified access to an algorithm module, the access being independent of the algorithm internals. The USC interface also enables binaries to be easily integrated into existing software applications. Decoupling the interface and the algorithm details enables making the development of system components independent of the algorithm implementation.
For more information, refer to the Unified Speech Component Interface manual (uscmanual.pdf) in the speech coding sample (part of the Intel® IPP main sample).
Intel® IPP functions optimized for Intel® Advanced Vector Extensions
Intel® Advanced Vector Extensions (Intel® AVX) is a 256-bit instruction set extension to Intel® Streaming SIMD Extensions (Intel® SSE), designed to provide even higher performance for applications that are floating-point intensive. Intel AVX adds new functionality to the existing Intel SIMD instruction set (based on Intel® SSE), and includes a more compact SIMD encoding format.
The Intel® IPP library has been optimized for a variety of SIMD instruction sets. Automatic "dispatching" detects the SIMD instruction set that is available on the running processor and selects the optimal SIMD instructions for that processor. Refer to Understanding CPU Dispatching in the Intel® IPP Library for more information regarding dispatching.
Intel® AVX optimization in the Intel® IPP library consists of "hand-optimized" and "compiler-tuned" functions-code that has been directly optimized for the Intel® AVX instruction set.
For more information on Intel® IPP functions optimized for Intel® AVX, refer to the article Intel® IPP Functions Optimized for Intel® AVX.
Microsoft* Real-Time Audio Codec (MSRTA)
RTAudio* is the preferred Microsoft® Real-Time audio codec, and is used by Microsoft Lync Server* (formerly Microsoft Office Communications Server*) and other communications applications like Microsoft Lync* (formerly Microsoft Office Communicator*) and Microsoft LiveMeeting* Console.
To get more information about the Microsoft RTAudio Codec, refer to Overview of the Microsoft RTAudio* Speech Codec.
Intel® IPP Real Time Audio functions
By combing Intel® IPP RT audio functions (refer to Appendix B), it is possible to construct a speech codec compliant with the Microsoft RTAudio* codec. The primitives are primarily designed to implement the well-defined, computationally expensive core operations that comprise the codec portion of the RTAudio system.
USC MSRTA Codec
The USC MSRTA codec supports 16-bit wideband 16000 Hz and narrowband 8000 Hz PCM mono signal compression and decompression, with 20ms frame lengths at bitrates 8800 bps and 18000 bps respectively.
To understand the usage of Intel® IPP in developing and measuring the USC MSRTA codec, please download the following free code samples: Code Samples for the Intel® Integrated Performance Primitives (Intel® IPP) Library
Extract all files in
w_ipp -sample_*.zip to the desired folder. Make sure to preserve the directory structure. The files for the USC Speech Codec, USC Echo Cancellation, UMC Speech RTP codec, and USC Filter and tones samples will be found in the
../ ipp_samples/speech-codecs folder.
How to build the source code
- Set system environment variable IPPROOT
- Open solution/project file in related Microsoft* Visual Studio
- Select the configuration/platform you need
- Build all projects in Microsoft Visual Studio*
- Run the codec
To run the sample for encode or for decode, use the following command line:
usc_speech_rtp_codec.exe [options] <infile> <outfile>
Depending on which of the two formats-WAVE or RTPDump-has the input file <infile>, either an encode or a decode operation will be performed respectively. For an encode operation, the output file <outfile> is stored in RTPDump format; for decode, the ouput file is stored in WAVE format. Option list:
|-format <codecname>||- codec option|
|-r<bitrate>||- bitrate option (mandatory)|
|-v||-Voice Activity Detector(VAD) enabled. Default: VAD disabled|
|Codec name||Supported bitrate, in bps||Codec description|
|IPP_MSRTAnb_FP||8800||Narrowband 8000 KHz MSRTA codec, floating-point implementation|
|IPP_MSRTAwb_FP||18000||Wideband 16000 KHz MSRTA codec, floating-point implementation|
To enable RT Audio codecs IPP_MSRTAnb_FP and IPP_MSRTAwb_FP, compile the sample source with _USC_MSRTA definition and link with the IPP RTA static library (ipp_rta.lib) provided in binary form.
For information on building the sample, see the readme file in the ipp_samples\ speech-codecs directory.
Refer to Appendix C for USC MSRTA Codec description
Performance of USC MSRTA Codec
|System Configuration||Intel® microarchitecture code name Nehalem (NHM)||Intel® microarchitecture code name Sandy Bridge (SNB)|
|CPU||Intel® Xeon® processor X5570 @ 2.93GHz||Genuine Intel® CPU 0 @ 3.00GHz|
|Operating System||Microsoft Windows* 2003||Microsoft Windows* x64 with SNB patch|
We used Intel® IPP 7.0.2 to measure the performance of USC MSRTA codec on Intel® microarchitecture code name Sandy Bridge with frequency of 3.0 GHz and Intel® microarchitecture code name Nehalem at 2.93 GHz. Both systems had Microsoft Windows* 64-bit installed. To measure the performance of the codec, different data sets were used to satisfy the requirement of the MSRTA codec. The two different types of data streams were 16-bit with 16000 Hz wideband data stream (s_16000_16.wav) and 16-bit with 8000 Hz narrowband (s_8000_16.wav).
As mentioned in the performance table above, encode and decode performances are measured in MHz. By multiplying the performance number in seconds by CPU frequency, you will get performance numbers in MHz. The duration of narrowband data is 1070 sec, and wideband is 1090 sec. To measure the performance improvement of the USC MSRTA codec on Intel® microarchitecture code name Sandy Bridge compared to Intel® microarchitecture code name Nehalem, divide the performance number of Intel microarchitecture code name Nehalem by the performance number of Intel microarchitecture code name Sandy Bridge.
For example, on 32-bit Intel microarchitecture code name Nehalem, narrowband encode performance of USC MSRTA without VAD is 20.42 sec. Performance of the same data set on Intel microarchitecture code name Sandy Bridge is 16.65 sec. To measure the speedup, divide 20.42 by 16.65 to get 1.23x. USC MSRTA decode performance exhibits similar benefits.
Intel® IPP is a highly optimized library for the latest Intel architecture, including Intel® microarchitecture code name Sandy Bridge. By using Intel IPP functions, Intel® AVX optimized audio and speech codecs can be developed, including the Microsoft RTAudio* (real time audio) codec. Intel IPP offers sample code to demonstrate the development and usage of the MSRTA codec.
Appendix A: Introduction to Intel® IPP
Intel® Integrated Performance Primitives Intel® Integrated Performance Primitives (Intel® IPP) is an extensive library of multi-core ready, highly optimized software functions for digital media and data processing applications. Intel® IPP offers thousands of optimized functions covering frequently-used fundamental algorithms. Intel® IPP functions are designed to deliver performance beyond what optimized compilers alone can deliver. Intel® IPP is a software library that offers various highly optimized functions including multimedia and speech codecs. For advanced performance and greater value, Intel® IPP is also available with Intel® Parallel Studio XE 2011. For more information, go to /en-us/articles/intel-ipp/.
Intel® IPP Speech Codec
Intel® IPP includes functions that can be used for implementing speech codecs. These codecs follow the International Telecommunication Union (ITU)* recommendations for G711 (companding functions), G.722, G.722.1, G.723.1, G726, G.728, G.729.1 and G.729 codecs, G.167, G.168 for Echo Canceller, G.169 for Audio Level control, European Telecommunications Standards Institute (ETSI)* specifications for GSM-AMR and GSM-FR codecs, as well as 3GPP* specification for AMRWB and AMRWB+ codecs, and Microsoft* Real-Time Audio codec.
Note: Implementations of these standards or the standard-enabled platforms may require licenses from various entities, including Intel Corporation.
Appendix B: Intel® AVX-optimized Intel® IPP RT Audio Functions
|Function Base Name||Operation|
|AdaptiveCodebookSearch_RTA||Searches for the adaptive codebook index and the lag, and computes the adaptive vector|
|FixedCodebookSearch_RTA, FixedCodebookSearchRandom_RTA||Searches for the fixed codebook vector|
|HighPassFilter_RTA||Performs high-pass filtering|
|LSPQuant_RTA||Performs quantization of LSP coefficients|
|LSPToLPC_RTA||Converts LSP coefficients to LP coefficients|
|QMFDecode_RTA||Performs QMF synthesis|
|PostFilter_RTA||Restores speech signal from the residual|
Other required functions for RT Audio functions are as below
|Function Base Name||Operation|
|LPCToLSP_RTA||Converts LP coefficients to LSP coefficients|
|LevinsonDurbin_RTA||Calculates LP coefficients from the autocorrelation coefficients|
|QMFGetStateSize_RTA||Calculates the size of the QMF filter state memory|
|QMFInit_RTA||Initializes the QMF filter state memory|
|QMFEncode_RTA||Performs QMF analysis|
|PostFilterGetStateSize_RTA||Calculates the size of the post filter state memory|
|PostFilterInit_RTA||Initializes the post filter state memory|
|BandPassFilter_RTA||Performs band pass filtering|
*LSP - Line spectral pairs
*LP - Linear Prediction
*QMF- Quadrature mirror filter
Appendix C: USC MSRTA Codec Description
For a description of the USC Interface, refer to the USC manual in the sample directory. Note that in the description of the Codec API function Decode when out-> pBuffer is zero, fake decode is performed, with no PCM stream output setting. out-> nBytes is set to bitstream length bytes without FEC.
The USC MSRTA codec is supported with following parameters:
|Codec names||: IPP_MSRTAnb_FP|
|Compression algorithms||: QMF, LPC, Adaptive and Fixed codebook for low band, Unvoiced Fixed codebook for high band, bandwidth control for variable bitrate mode|
|Signal||: 16bit linear PCM|
|Bitrates||: 8800 bps (176 bpf, 22 bytes)|
|Voice Activity Detection||: variable rate support|
|Packet Loss Concealment||: PLC supported|
|Frame type (value)||: default 0, not required|
|Standard||: Microsoft Real-Time Audio.|
|Codec names||: IPP_MSRTAwb_FP|
|Compression algorithms||: QMF, LPC, Adaptive and Fixed codebook for low band, Unvoiced Fixed codebook for high band, bandwidth control for variable bitrate mode.|
|Signal:||: 16bit linear PCM.|
|Sampling:||: 16000 Hz.|
|Bitrates:||: 18000 bps (360 bpf, 45 bytes)|
|Voice Activity Detection||: variable rate support|
|Packet Loss Concealment||: PLC supported|
|Frame type (value):||: default 0, not required|
|Standard:||: Microsoft Real-Time Audio.|
- Intel® Integrated Performance Primitives (Intel® IPP) Functions Optimized for Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® AVX: New Frontiers in Performance Improvements and Energy Efficiency
- Intel® AVX and CPU Instructions Forum
- Intel® Integrated Performance Primitives Forum
About the Author
Naveen Gv is a Technical Consulting Engineer (TCE) in the performance library team. At Intel he has specialized on Multi-core programming, Intel Performance Primitives and Intel Math Kernel Library. His professional interests are teaching Multi core Programming Methodology to software community and implementing Digital Signal Processing algorithms on x86 platform. Naveen has worked with several universities across Asia Pacific to implement Multi-Core programming in academia. His e-mail address is naveen.gv at intel.com