Understanding CPU Dispatching in the Intel® IPP Libraries versions 5.3 thru 6.1

Note: this article describes the SIMD support in versions 5.3 thru 6.1 of the Intel IPP library. The minimum SIMD requirements have changed with release 7.0 of the Intel IPP library. For more information regarding the SIMD optimization layers present in the Intel IPP 7.0 library please read the article titled Understanding SIMD Optimization Layers and Dispatching in the Intel® IPP 7.0 Library.

The Intel IPP library contains a collection of functionally identical processor-specific optimized libraries that are “dispatched” at run-time. The “dispatcher” chooses which of these processor-specific optimized libraries to use when your application makes a call into the IPP library. This is done to maximize each function’s use of the underlying SIMD instructions and other architecture-specific features.

Note: you can build custom processor-specific libraries that do not require the dispatcher, but that is outside thescope of this article. Please read this IPP linkage models article for information on how to build custom versions of the IPP library.

Dispatching refers to the process of detecting CPU features at run-time and then selecting the Intel IPP optimized library set that corresponds to your CPU. For example, in the \ia32\bin directory, the ippiv8-x.x.dll library file contains version ‘x.x’ of the optimized image processing libraries for Intel® Core™ 2 Duo processors; ‘ippi’ refers to the image processing library, ‘v8’ refers to the Core 2 architecture, and ‘x.x’ refers to the library’s major version numbers.

In the general case, the “dispatcher” identifies the run-time processor only once, at library initialization time. It sets an internal table or variable that directs your calls to the internal functions that match your architecture. For example, ippsCopy_8u(), may have multiple implementations stored in the library, with each version optimized to a specific Intel® processor architecture. Thus, the p8_ippsCopy_8u() version of ippsCopy_8u() is called by dispatcher when running on an Intel Core 2 Duo processor on IA-32, because it is optimized for this processor architecture.

Note: IPP architectures generally correspond to SIMD (MMX, SSE, AES, etc.) instructions sets.

Initializing the IPP Dispatcher

The process of identifying the specific processor being used, and initialization of the dispatcher, should be performed before you make any calls into the IPP library. If you are using a dynamic link library this process is handled automatically when the dynamic link library is initialized. However, if you are using a static library you must perform this step manually. See this article on the ipp*Init*() functions for more information on how to do this.

The following table lists all the architecture codes defined by the Intel IPP library through version 6.1 of the product. Note that some of these IPP architectures have been deprecated and are no longer supported in the current version of the product. Deprecated architectures are identified in the “Notes” column of the table.

Platform Architecture SIMD Requirements Processor / µarchitecture Notes
IA-32 px C optimized for all IA-32 processors i386+  
  a6 SSE Pentium III thru 5.3 only
  w7 SSE2 P4, Xeon, Centrino  
  t7 SSE3 Prescott, Yonah  
  v8 Supplemental SSE3 Core 2, Xeon® 5100, Atom  
  s8 Supplemental SSE3 (compiled for Atom) Atom new in 6.0
  p8 SSE4.1, SSE4.2, AES-NI Penryn, Nehalem, Westmere see notes below
  g9 AVX Sandy Bridge µarchitecture new in 6.1
Intel® 64 (EM64T) mx C-optimized for all Intel® 64 platforms P4 SSE2 minimum
  m7 SSE3 Prescott  
  u8 Supplemental SSE3 Core 2, Xeon® 5100, Atom  
  n8 Supplemental SSE3 (compiled for Atom) Atom new in 6.0
  y8 SSE4.1, SSE4.2, AES-NI Penryn, Nehalem, Westmere see notes below
  e9 AVX Sandy Bridge µarchitecture new in 6.1
Itanium® i7 Intel® Itanium® processor family Itanium  
IXP4xx sx C-optimized for IXP4xx processors IXP/XScale thru 5.3 only
  s2 IXP4xx optimized IXP/XScale thru 5.3 only


For non-Intel based processors support, please see the article titled Use Intel® IPP on Intel or Compatible AMD* Processors.

For more information regarding Intel IPP library support for XScale* processors, please see the following article:
PXA9xx / PXA27x / XScale -- How to get Developer Support

P8/Y8 Internal Run-Time Dispatcher

Within the 32-bit p8 and equivalent 64-bit y8 architectures there is an additional "run-time" dispatching mechanism, a kind of mini-dispatcher. The Nehalem (Intel Core i7) and Westmere processor families add additional SIMD instructions beyond those defined by SSE4.1. The Nehalem processor family adds the SSE4.2 SIMD instructions and the Westmere family adds AES-NI.

Creating two additional internal versions of the IPP library for the SSE4.2 and AES-NI instructions would be very space inefficient, so they are bundled as part of the SSE4.1 library. When you call a function that includes, for example, AES-NI optimizations, an additional jump directs your call to the AES-NI version within the p8/y8 library. Because the enhancements affect the optimization of only a small number of IPP functions, this additional overhead occurs infrequently and only when your application is executing on a p8/y8 architecture processor.

S8/N8 (Atom) Dispatch

The s8/n8 library (Atom-optimized) is not present in the static libraries, only in the dynamic libraries. However, IPP applications built with the static library will run on an Atom processor with very good to equivalent performance using the v8/u8 library (which is automatically dispatched, you do not need to do anything special for the Atom processor).

The Linux distributions of the IPP library include a separate Atom-specific version of the library. However, you do not need to use this Atom-specific library if you are building an IPP application that will be run on multiple processor platforms, including Atom processors. This Atom-only version of the library is provided as a convenience for building IPP applications that will run ONLY on an Atom, as opposed to IPP applications that may run on a variety of processor platforms.

The fundamental difference between the s8/n8 and v8/u8 libraries are the compiler options used to build them, which accommodates the differences in the construction of the instruction pipelines between the Atom and other SSSE3 processors. Both libraries are Supplemental SSE3 libraries and the s8/n8 version of the IPP library does not use any Atom-unique instructions, so no features are lost by running the v8/u8 slice of the library on an Atom processor. Also, these two variations in the library (s8/n8 and v8/u8) give nearly identical performance on an Atom.

Processor Architecture Table

The following table was copied from an Intel Compiler Pro options article describing some compiler architecture options. It contains a list of Intel processors showing which processors support which SIMD instructions. For the latest table please refer to the original article; it gets updated on a regular basis. Please note that the behavior of the Intel Compiler SIMD dispatcher described in that article does not apply to the Intel IPP library.

The Intel IPP library dispatching mechanism behaves differently than that found in the Intel Compiler products, and may also behave differently than other Intel library products.

Additional information regarding dispatching and how it relates to non-Intel processors can be found here. How to identify your specific processor is described here. To correlate a processor family name with an Intel CPU brand name, use the ark.intel.com web site.

SSE4.2
Intel® Core™ i7 processors
Intel® Core™ i5 processors
Intel® Core™ i3 processors
Intel® Xeon® 55XX series

SSE4.1
Intel® Xeon® 74XX series
Quad-Core Intel® Xeon 54XX, 33XX series
Dual-Core Intel® Xeon 52XX, 31XX series
Intel® Core™ 2 Extreme 9XXX series
Intel® Core™ 2 Quad 9XXX series
Intel® Core™ 2 Duo 8XXX series
Intel® Core™ 2 Duo E7200

SSSE3
Quad-Core Intel® Xeon® 73XX, 53XX, 32XX series
Dual-Core Intel® Xeon® 72XX, 53XX, 51XX, 30XX series
In tel® Core™ 2 Extreme 7XXX, 6XXX series
Intel® Core™ 2 Quad 6XXX series
Intel® Core™ 2 Duo 7XXX (except E7200), 6XXX, 5XXX, 4XXX series
Intel® Core™ 2 Solo 2XXX series
Intel® Pentium® dual-core processor E2XXX, T23XX series

SSE3
Dual-Core Intel® Xeon® 70XX, 71XX, 50XX Series
Dual-Core Intel® Xeon® processor (ULV and LV) 1.66, 2.0, 2.16
Dual-Core Intel® Xeon® 2.8
Intel® Xeon® processors with SSE3 instruction set support
Intel® Core™ Duo
Intel® Core™ Solo
Intel® Pentium® dual-core processor T21XX, T20XX series
Intel® Pentium® processor Extreme Edition
Intel® Pentium® D
Intel® Pentium® 4 processors with SSE3 instruction set support

SSE2
Intel® Xeon® processors
Intel® Pentium® 4 processors
Intel® Pentium® M

IA32
Intel® Pentium® III Processor
Intel® Pentium® II Processor
Intel® Pentium® Processor

Optimization Notice in English


*Other names and brands may be claimed as the property of others.

 

Einzelheiten zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.