Threading and the Intel® IPP Library - part 1 of 3

Introduction to Threading in IPP

There is no universal threading solution that works for every application. Fortunately the Intel® Integrated Performance Primitives (the Intel IPP library) are designed to be thread-safe. Applications that use the IPP library can implement threading at the low-level primitive level (i.e., within the IPP library using OpenMP), at the operating system level (e.g., native threads), or somewhere in between (e.g., Threading Building Blocks aka TBB).

For a quick summary of the differences between OpenMP, TBB, and native threads please read Intel® Threading Building Blocks, OpenMP, or native threads?

The IPP library is available as a standalone product or as a component in Intel® Parallel Studio, a threading development environment designed specifically to help you design and debug threaded applications on multi-core platforms.

Saying the IPP library is thread-safe means that functions within the library can be called simultaneously from multiple threads within your application. The primitives are independent of the underlying operating system; they do not use locks, semaphores, or static memory; they rely only on the standard C library memory allocation routines (malloc/realloc/calloc/free) for temporary and state memory storage. To further reduce dependency on external functions you can use the i_malloc interface to substitute your own memory allocation routines for the standard C memory allocation routines (but that’s fodder for another blog :-).

Three variants of the library are available: two have multi-threading built in (using OpenMP) and one is single-threaded. All three variants are thread-safe. Of course, the simplest way to add some threading to your IPP-enabled application is to link with one of the multi-threaded variants of the library. However, that may not always give you the optimum results.

Performance Possibilities

Even without threading the IPP library provides a significant performance boost, by giving your application easy access to the SIMD (Single Instruction, Multiple Data) instructions (MMX, SSE, AES, AVX, et al instructions) through the primitive functions in the library that are designed to meet the needs of numeric-intensive algorithms like image and video processing, digital filtering, string operations, and data compression.

The chart below gives an indication as to the level of performance improvements that are possible using the IPP primitives in a single-threaded application. This chart shows a relative performance improvement measured for the various IPP product domains as compared to the equivalent functions when implemented without the aid of MMX/SSE instructions. 

 Intel® Xeon® 4 Processor, 2.8GHz, 2GB using Windows* XP

YMMV! (Your Mileage May Vary!) The performance improvement you see is a function of which primitives you use, where they are used in your application, how often they are used, how your program is structured, the type of data you operate on, the processor you use, etc., etc., etc… See Benchmark Limitations for the legalese. :-)

The following table highlights performance improvements achieved by using multiple threads of execution in some IPP-enabled applications. In these examples threading was implemented within the application and threading built into the primitives was disabled (single-threaded library).

ApplicationThreading TechnologyThreading TechniquePerformance Gain
H.264 – decoding HD streamnative threads* on-slice parallelization
* inside-slice parallelization
* inter-frame parallelization
~2x on dual-core
~3.2-3.7x on quad-core
JPEG – decoding medium sized imageOpenMP*each MCU row is processed in parallel~1.9x on dual-core
GZIP datanative threadssplit file into four equal chunks~10x on quad-core

YMMV! See Benchmark Limitations for the legalese. :-)

The IPP applications that were used to illustrate the parallelization results in the table above are part of the free IPP samples. The full source to these samples are provided in the download. If you want to evaluate the IPP library go here and click the Evaluate button.

Part 2: Threading Within Your IPP Application

Part 3: OpenMP Threading and Intel IPP

About IPP

The Intel® Integrated Performance Primitives (the Intel® IPP library) is a collection of highly optimized functions for frequently-used fundamental algorithms found in a variety of domains including signal processing, image/audio/video encode/decode, data compression, string processing, and encryption. The library takes advantage of the extensive SIMD (single instruction multiple data) instructions and multiple hardware execution threads available in modern Intel processors. These instructions are ideal for optimizing algorithms that operate on arrays and vectors of data.

The IPP library is available for use with applications built for the Windows, Linux, Mac OS X, and QNX operating systems and is compatible with the Intel C and Fortran Compilers, the Microsoft Visual Studio C/C++ compilers, and the gcc compilers found in most Linux distributions. The library is validated for use with multiple generations of Intel and compatible AMD* processors, including the Intel® Core™ and Intel® Atom™ processors. Both 32-bit and 64-bit operating systems and architectures are supported.

The Intel® IPP library is available as a standalone product or as a component in the Intel® Professional Edition compilers and Intel® Parallel Studio. Parallel Studio brings comprehensive parallelism to C/C++ Microsoft Visual Studio* application development and was created to ease the development of parallelism in your applications. Parallel Studio is interoperable with common parallel programming libraries and API standards, such as Intel® Threading Building Blocks (Intel® TBB) and OpenMP*, and provides an immediate opportunity to realize the benefits of multicore platforms.

* Other names and brands may be claimed as the property of others.

For more complete information about compiler optimizations, see our Optimization Notice.