| Last Modified On : | October 23, 2008 4:03 PM PDT |
Rate |
|
by Karthik Krishnan
Discover how Intel® Integrated Performance Primitives (Intel® IPP) can provide the building blocks to develop a VoIP application with advanced features. Get the building blocks to build a complete softphone application.
Voice-over Internet Protocol (VoIP) is revolutionizing the telecommunications industry by merging voice and data onto one IP network. Intel offers a range of products, services, and building blocks to enable VoIP solutions over various domains. Intel® Integrated Performance Primitives (Intel® IPP) is a software library that offers various highly optimized functions including multimedia and speech codecs. This article provides reference points on using Intel® IPP for speech codecs along with a complete implementation of a VoIP softphone. The sample application has been built using Windows Sockets* for network communication, DirectSound* for audio capture and playback, and Wide Band Codecs (GSM-AMR adaptive multi-rate) using Intel IPP.
Intel IPP is a highly optimized cross-platform library that includes various functionalities related to multimedia and communication software. The G.168, G.167, G.711, G.722, G.722.1, G.722.2, AMRWB, G.723.1, G.726, G.728, G.729, GSM-AMR, and GSM-FR are international standards promoted by International Telecommunication Union (ITU)*, European Telecommunications Standards Institute (ETSI)*, 3GPP* and other organizations. Below is a list of speech coding samples built with Intel Integrated Performance Primitives as the building blocks that are bit-exact with the standard.
| Speech Coding Samples | Windows* | Linux* |
| G.722.1 | ||
| GSM/WMR WB / G.722.2 | ||
| G.723.1 | ||
| G.726 | ||
| G.728 | ||
| G.729 | ||
| GSM-AMR | ||
| GSM-FR |
Note that implementations of these standards or the standard-enabled platforms may require licenses from various entities, including Intel Corporation. This paper uses ITU GSM-AMR (adaptive multi-rate) as the reference codec to be used during VoIP call.
Intel IPP provides various mechanisms to link an application code with the library such as Static Linkage, Dynamic Linkage, and Automatic Dispatching. For detailed information please refer to Linkage Models (PDF 231 KB). The softphone application included (see link in Additional Resources section) uses dynamic linkage with automatic dispatching.
GSM-AMR has 16-bit per sample, 16 KHz sampling rate and supports various output bit rates (6.6kbps, 8.85kbps, etc.) The table below lists all the supported bit rates available from Intel IPP and the corresponding output size per frame (i.e., 20ms audio input in 600 bytes).
| Frame Type | GSM AMR-WB (bitrate in kbps) | Output bits per frame |
| 0 | 6.6 | 132 |
| 1 | 8.85 | 177 |
| 2 | 12.65 | 253 |
| 3 | 14.25 | 285 |
| 4 | 15.85 | 317 |
| 5 | 18.25 | 365 |
| 6 | 19.85 | 397 |
| 7 | 23.05 | 461 |
The speech codec samples use Intel IPP as building blocks and include a complete implementation of all the supported codecs that are bit-exact per the standard. The sample code in Intel IPP 5.0 also includes a unified approach that facilitates the integration of all the codecs. The following section provides some pointers on using the Unified Speech Codec (USC) approach to integrate the encoding and decoding functionality of GSM-AMR codec.
#ifdef __cplusplus |
/* assumes initialization is complete. Sample softphone does not change the bit rate once the VoIP |
int DecodeOneFrame(char *src,char *dst,int frametype) |
The encoder takes 16-bit Linear PCM data input which is the pure, uncompressed binary code representation of the value of an analog signal (e.g. voice) after digitization. The decoder takes the compressed data from the encoder as input and outputs the raw PCM file. This section explains using DirectSound* to capture and play back raw PCM files at the desired sampling frequency (16KHz for GSM-AMR).
Microsoft DirectSound provides various APIs to capture and play audio content. The softphone application enclosed uses the sample code available from Microsoft Platform SDK for audio capture and playback. This section explains high-level implementation details.
Audio capture works by creating a circular buffer to hold the captured audio data in raw PCM format. The sampling rate, bit size per sample (16 KHz, 16-bit), and the total size of capture buffer are set and allocated during the initialization phase. DirectSound also provides a way to trigger event objects every time a certain amount of audio data gets captured in the buffer. Audio capture is typically handled in a separate thread, and the audio extraction thread periodically waits on these event objects to extract the captured audio data. The following provides the control flow of audio capture using DirectSound.
The captured audio data needs to be extracted periodically (typically every frame) to be passed on to the encoder. The extraction functionality could be periodically executed (for example, every 20ms) using timeSetEvent() API. The following provides an overview of the extraction functionality.
Note that extracting the raw PCM data is done in two phases since the buffer is circular, and the captured data might have wrapped around the end of the allocated memory. The capture and extraction functionality run in separate threads. It is also possible that some of these notification events may not be waited upon and could be "missed." In such cases, the corresponding PCM data is extracted on the next signal.
The playback code works in a way similar to that of the audio capture. The playback buffer needs to be filled with the raw PCM audio content from other speakers. Once the encoded packets are received, they are passed on to the decoder to extract the source PCM data. The playback buffer is locked, and the PCM data copied to the buffer and played. Jittering effects could be taken into account by maintaining a threshold before playing the PCM data.
The sample application uses Windows Sockets using TCP/IP as the network transport to transfer the voice packets between nodes. The softphone allows multi-user conferencing and supports GSM-AMR codec with various bitrates. The initiator (or the host) functions like a server, listening to all the incoming calls at a specified port. The other VoIP speakers connect to the host at a specified port. The host waits for the incoming connections until it times out. The host broadcasts the IP addresses of all the connected nodes after the time-out period. A star network connecting all the VoIP participants with each other is established after this.
The following flow chart provides the high-level implementation details of the complete application.
The code was developed using C++ on an Intel® Pentium® M processor-based system with Intel® Centrino® mobile technology running the Windows XP Professional* operating system. The UI of the softphone is shown below.
Fig 1. User Interface of the softphone application
The white paper discussed here demonstrates how to use the Intel® Integrated Performance Primitives to develop a VoIP application. The intention has been to provide building blocks that could be used to build a complete softphone application with advanced features. VoIP features such as Jitter Buffering, Frame Compaction and RTP transport have not been discussed, but given the framework provided by the sample code, it may be useful to experiment with these next.
Karthik Krishnan is an applications engineer working for Intel's Software and Solutions group. He joined Intel in 2001 and has been working with various software vendors to optimize their products on Intel® Mobile and desktop platforms. Prior to joining Intel, he has worked for Fluent Inc. as a software developer dealing with parallel programming.
