OpenMP Threading and Intel IPP
The low-level primitives within the IPP library generally represent basic atomic operations. This limits threading within the library to ~15-20% of the primitives. Intel OpenMP is used to implement internal threading and is enabled, by default, when you use one of the multi-threaded variants of the library. Multi-threaded versions of the library are only supported on Linux, Windows, and Mac OS X.
A list of the threaded primitives in the IPP library is provided in the ThreadedFunctionsList.txt file located in the library’s doc directory.
Note: the fact that the Intel IPP library is built with the Intel C compiler and OpenMP is not a requirement that your application must also be built using these tools! The IPP library is compatible with the C/C++ compiler for your OS platform and is ready to link with your application. You can build an IPP application with either your preferred development tools or the Intel tools for that OS.
The parallel development tools that are part of Intel Parallel Studio can be used in an OpenMP environment.
Controlling OpenMP Threading in the Intel IPP Primitives
The default maximum number of OpenMP threads used by the multi-threaded IPP primitives is equal to the number of hardware threads in the system, which is determined by the number and type of CPUs in your system. That means that a quad-core processor with Intel® HT has eight hardware threads (four cores, each core has two threads), and a dual-core CPU without Intel HT has two hardware threads.
There are two IPP primitives for control and status of the OpenMP threading used within the library: ippSetNumThreads() and ippGetNumThreads(). You call ippGetNumThreads to determine the current thread cap and ippSetNumThreads to change the thread cap. ippSetNumThreads will not allow you to set the thread cap beyond the number of available hardware threads. This thread cap is an upper bound on the number of threads that can be used within a multi-threaded primitive. Some IPP functions may use fewer threads than specified by the thread cap, but they will never use more than the thread cap.
To disable OpenMP threading within the library you need to call ippSetNumThreads(1) near the beginning of your application. Or, you can link your application with the single-threaded variant of the library.
The OpenMP library used by the IPP library references several configuration environment variables. In particular, OMP_NUM_THREADS sets the default number of threads (the thread cap) to be used by the OpenMP library at run time. However, the IPP library will override this setting by limiting the number of OpenMP threads used by your application to be either the number of hardware threads in the system, as described above, or the value specified by a call to ippSetNumThreads, whichever is lower. OpenMP applications on your system that do not use the Intel IPP library might still be affected by the OMP_NUM_THREADS environment variable; likewise, any such OpenMP applications will not be affected by a call to the ippSetNumThreads function within your Intel IPP application.
If your application that is using the Intel IPP library also implements multi-threading via OpenMP, the threaded Intel IPP primitives your application calls may execute as single-threaded primitives. This happens when an IPP primitive is called within an OpenMP parallelized section of code and if nested parallelization has been disabled, which is the default case.
By nesting parallel OpenMP regions you risk creating a large number of threads that effectively oversubscribe the number of hardware threads available. Creating parallel region always incurs overhead, and the overhead associated with nesting parallel OpenMP regions may outweigh the benefit.
In general, OpenMP threaded applications that use the IPP library should disable multi-threading within the library, either by calling ippSetNumThreads(1) or by using the single-threaded static Intel IPP library.
Some of the Intel IPP primitives in the signal processing domain are designed to execute parallel threads that exploit a merged L2 cache. These functions (single and double precision FFT, Div, Sqrt, etc.) need a shared cache in order to achieve their maximum multi-threaded performance. In other words, the threads within these primitives should execute on CPU cores located on a single die with a shared cache. To insure this condition is met, the following OpenMP environment variable should be set before an application using the Intel IPP library runs:
On processors with two or more cores on a single die, this condition is satisfied automatically and the environment variable is superfluous. However, for those systems with more than two dies (e.g., a Pentium D or a multi-socket motherboard), where the cache serving each die is not shared, failing to set this OpenMP environmental variable can actually result in performance degradation for this class of multi-threaded Intel IPP primitives.
The Intel® Integrated Performance Primitives (the Intel® IPP library) is a collection of highly optimized functions for frequently-used fundamental algorithms found in a variety of domains including signal processing, image/audio/video encode/decode, data compression, string processing, and encryption. The library takes advantage of the extensive SIMD (single instruction multiple data) instructions and multiple hardware execution threads available in modern Intel processors. These instructions are ideal for optimizing algorithms that operate on arrays and vectors of data.
The IPP library is available for use with applications built for the Windows, Linux, Mac OS X, and QNX operating systems and is compatible with the Intel C and Fortran Compilers, the Microsoft Visual Studio C/C++ compilers, and the gcc compilers found in most Linux distributions. The library is validated for use with multiple generations of Intel and compatible AMD* processors, including the Intel® Core™ and Intel® Atom™ processors. Both 32-bit and 64-bit operating systems and architectures are supported.
The Intel® IPP library is available as a standalone product or as a component in the Intel® Professional Edition compilersandIntel® Parallel Studio. Parallel Studio brings comprehensive parallelism to C/C++ Microsoft Visual Studio* application development and was created to ease the development of parallelism in your applications. Parallel Studio is interoperable with common parallel programming libraries and API standards, such as Intel® Threading Building Blocks (Intel® TBB) and OpenMP*, and provides an immediate opportunity to realize the benefits of multicore platforms.
* Other names and brands may be claimed as the property of others.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804