This article describes the Intel® Integrated Performance Primitives (Intel® IPP) optimization layers present in the 7.0 version of the library. The article titled Understanding CPU Dispatching in the Intel® IPP Library describes the same features for previous versions of the library (5.3 thru 6.1).
IMPORTANT! The minimum SIMD instruction levels supported by version 7.0 of the Intel IPP library has changed! Applications built with this version of the library require that processors must support at least the Intel® Streaming SIMD Extensions 2 (Intel® SSE2) instruction set when built for Intel IA-32 processors (ia32) and the Intel® Streaming SIMD Extensions 3 (Intel® SSE3) instruction set when built for Intel® 64 processors (intel64). The non-optimized layers of the library (px on ia32 and mx on intel64) have been removed; the w7 and m7 optimization layers are now the default optimization layers.
The standard distribution of the Intel IPP library contains multiple, functionally-identical, SIMD-specific, optimized libraries (or layers) that are automatically “dispatched” at run-time. The “dispatcher” directs your calls to the appropriate optimized library layer based on SIMD capabilities discovered during library initialization. This is done to maximize each function’s use of the runtime processor's underlying SIMD instructions and other architecture-specific features.
Note: you can build custom processor-specific libraries that do not require the dispatcher, but that is outside the scope of this article. Please read this IPP linkage models article for information on how to build custom versions of the IPP library.
Dispatching selects the Intel IPP optimized library layer that corresponds to the runtime CPU's SIMD instruction set. For example, on a Windows installation, the $(IPPROOT)\..\redist\intel64\ipp directory contains a file named ippiu8-7.0.dll which contains version ‘7.0’ of the optimized image processing libraries for processors that support the Intel SSE3 instructions on 64-bit processors; ‘ippi’ denotes the image processing domain, ‘u8’ denotes the SSSE3 instructions set for 64-bit processors and ‘7.0’ denotes the library’s version number.
In the general case, the “dispatcher” identifies the run-time processor only once, at library initialization time, and sets up a variable internal to the library that directs your calls to the SIMD-specific functions that match the runtime processor. For example, ippsCopy_8u(), has multiple implementations stored in the library, with each version optimized to a specific SIMD instruction set. The u8_ippsCopy_8u() version of ippsCopy_8u() is called by the dispatcher when running on an Intel® Core 2 Duo® processor in 64-bit addressing mode, because u8_ippsCopy_8u() is optimized for the SSSE3 instruction set architecture supported by that processor in 64-bit addressing mode.
Note: IPP architectures generally correspond to SIMD (MMX, SSE, AES, etc.) instructions sets, with some minor variations (see the p8 and y8 optimization layers).
Initializing the IPP Dispatcher
Identifying the runtime processor and initializing the dispatcher should be the first action you take with the Intel IPP library. If you are using the standard dynamic link library this process is handled automatically when the Intel IPP shared library is initialized. If you are using a static library you must perform this step manually. See this article on the ipp*Init*() functions for more information on how to do this.
Because the minimum SIMD instruction set is SSE2 on IA-32 and SSE3 on Intel 64 processors it is recommended that you ALWAYS call the the
ippInit() function before making any other calls to the Intel IPP library. This advice applies regardless of whether you are linking against the static or dynamic form of the library (even though the dynamic library will also perform this call).
ippInit() function with the shared libraries (DLL and SO) will generate an error message to a dialog box or error console if the
ippInit() function detects that the runtime CPU is not supported by the Intel IPP library. Calling the
ippInit() function in the static versions of the library will not generate a console or dialog message. Both versions of the
ippInit() function will return an error code when a non-supported CPU is detected.
It is important that you call the
ippInit()function at the beginning of your application to insure that the processor on which your application is running will support the Intel IPP library. If the
ippInit()function returns an error code you should close your application gracefully in order to avoid an unexpected termination of your application by an invalid instruction fault because your application is running on an unsupported processor.
The following table lists the SIMD architecture codes supported by version 7.0 of the Intel IPP library.
|Platform||Architecture||SIMD Requirements||Processor / µarchitecture||Notes|
|IA-32||w7||SSE2||P4, Xeon, Centrino||SSE2 default|
|v8||Supplemental SSE3||Core 2, Xeon® 5100, Atom|
|s8||Supplemental SSE3 (compiled for Atom)||Atom|
|p8||SSE4.1, SSE4.2 and AES-NI||Penryn, Nehalem, Westmere||see next section|
|g9||AVX||Sandy Bridge µarchitecture|
|Intel® 64 (EM64T)||m7||SSE3||Prescott||SSE3 default|
|u8||Supplemental SSE3||Core 2, Xeon® 5100, Atom|
|n8||Supplemental SSE3 (compiled for Atom)||Atom|
|y8||SSE4.1, SSE4.2, AES-NI||Penryn, Nehalem, Westmere||see next section|
|e9||AVX||Sandy Bridge µarchitecture|
For non-Intel based processors support, please read Use Intel® IPP on Intel or Compatible AMD* Processors.
If you compare this dispatch table above to the 5.3 thru 6.1 dispatch table you will note that the Intel SSE3 optimization layer (t7) has been removed from the 32-bit edition (ia32) of the library. 32-bit applications built with the 7.0 version of the library that execute on an SSE3 processor will automatically use the Intel SSE2 optimization layer (w7). In most cases, the impact of this change is minor, since the performance difference between the Intel SSE3 (t7) and Intel SSE2 (w7) optimization layers in the Intel IPP library is minimal. Processors that support the Intel SSSE3 instruction set (v8 and s8 optimization layers) are not affected by this change. (Note: this change does not impact applications built using the 64-bit edition of the library, which now uses the Intel SSE3 optimization layer (m7) as its default path.)
P8/Y8 Internal Run-Time Dispatcher
Within the 32-bit p8 and equivalent 64-bit y8 architectures there is an additional "runtime dispatcher," a mini-dispatcher. The Nehalem and Westmere processor microarchitectures add additional SIMD instructions beyond those defined by SSE4.1. The Nehalem processor microarchitecture added SSE4.2 SIMD instructions and the Westmere processor microarchitecture added Inte® AES-NI.
Creating two separate optimization layers within the IPP library for the small set of instructions added by SSE4.2 and AES-NI would be very space inefficient, so they are bundled into the SSE4.1 library (p8/y8) as minor variants to that optimization layer. When you call a function that includes, for example, AES-NI optimizations, an additional jump directs your call to the AES-NI version within the p8/y8 library if your runtime processor supports these instructions. Because the enhancements affect the optimization of only a small number of Intel IPP functions, this additional overhead occurs infrequently and only when your application is executing on a p8/y8 architecture processor that supports these extra instructions.
S8/N8 (Atom) Dispatch
Unlike preceding versions of the library, the 7.0 version of the Intel IPP library does include Atom-optimized variants of the library within all formats (static and dynamic) of the library. For this reason, the Linux distribution of the 7.0 version of the Intel IPP library no longer includes a separate Atom-specific version of the library, since Atom-specific optimizations have been fully merged into all formats of the standard library files.
Please read Intel® Atom™ Processors Support in the Intel® Integrated Performance Primitives (Intel® IPP) Library for more information regarding Atom optimizations in the IPP library.
Processor Architecture Table
The following table was copied from an Intel Compiler Pro options article describing some compiler architecture options. It contains a list of Intel processors showing which processors support which SIMD instructions. For the latest table please refer to the original article; it gets updated on a regular basis. Please note that the behavior of the Intel Compiler SIMD dispatcher described in that article does not apply to the Intel IPP library.
The Intel IPP library dispatching mechanism behaves differently than that found in the Intel Compiler products, and may also behave differently than other Intel library products.
Additional information regarding dispatching and how it relates to non-Intel processors can be found here. How to identify your specific processor is described here. To correlate a processor family name with an Intel CPU brand name, use the following web site: ark.intel.com.
Intel® Core™ i7 processors
Intel® Core™ i5 processors
Intel® Core™ i3 processors
Intel® Xeon® 55XX series
Intel® Xeon® 74XX series
Quad-Core Intel® Xeon 54XX, 33XX series
Dual-Core Intel® Xeon 52XX, 31XX series
Intel® Core™ 2 Extreme 9XXX series
Intel® Core™ 2 Quad 9XXX series
Intel® Core™ 2 Duo 8XXX series
Intel® Core™ 2 Duo E7200
Quad-Core Intel® Xeon® 73XX, 53XX, 32XX series
Dual-Core Intel® Xeon® 72XX, 53XX, 51XX, 30XX series
In tel® Core™ 2 Extreme 7XXX, 6XXX series
Intel® Core™ 2 Quad 6XXX series
Intel® Core™ 2 Duo 7XXX (except E7200), 6XXX, 5XXX, 4XXX series
Intel® Core™ 2 Solo 2XXX series
Intel® Pentium® dual-core processor E2XXX, T23XX series
Dual-Core Intel® Xeon® 70XX, 71XX, 50XX Series
Dual-Core Intel® Xeon® processor (ULV and LV) 1.66, 2.0, 2.16
Dual-Core Intel® Xeon® 2.8
Intel® Xeon® processors with SSE3 instruction set support
Intel® Core™ Duo
Intel® Core™ Solo
Intel® Pentium® dual-core processor T21XX, T20XX series
Intel® Pentium® processor Extreme Edition
Intel® Pentium® D
Intel® Pentium® 4 processors with SSE3 instruction set support
Intel® Xeon® processors
Intel® Pentium® 4 processors
Intel® Pentium® M
Intel® Pentium® III Processor
Intel® Pentium® II Processor
Intel® Pentium® Processor
*Other names and brands may be claimed as the property of others.