Understanding CPU Dispatching in the Intel® IPP Libraries

     The Intel(R)  IPP library contains a collection of functionally identical processor-specific optimized libraries that are “dispatched” at run-time. The “dispatcher” chooses which of these processor-specific optimized libraries to use when your application makes a call into the IPP library. This is done to maximize each function’s use of the underlying SIMD instructions and other architecture-specific features.

Note: you can build custom processor-specific libraries that do not require the dispatcher, but that is outside thescope of this article. Please read this IPP linkage models article for information on how to build custom versions of the IPP library.

Dispatching refers to the process of detecting CPU features at run-time and then selecting the Intel IPP optimized library set that corresponds to your CPU. For example, in the \ia32\bin directory, the ippiv8-x.x.dll library file contains version ‘x.x’ of the optimized image processing libraries for Intel® Core™ 2 Duo processors; ‘ippi’ refers to the image processing library, ‘v8’ refers to the Core 2 architecture, and ‘x.x’ refers to the library’s major version numbers.

In the general case, the “dispatcher” identifies the run-time processor only once, at library initialization time. It sets an internal table or variable that directs your calls to the internal functions that match your architecture. For example, ippsCopy_8u(), may have multiple implementations stored in the library, with each version optimized to a specific Intel® processor architecture. Thus, the p8_ippsCopy_8u() version of ippsCopy_8u() is called by dispatcher when running on an Intel Core 2 Duo processor on IA-32, because it is optimized for this processor architecture.

Note: IPP architectures generally correspond to SIMD (MMX, SSE, AES, etc.) instructions sets.

Initializing the IPP Dispatcher

The process of identifying the specific processor being used, and initialization of the dispatcher, should be performed before you make any calls into the IPP library. If you are using a dynamic link library this process is handled automatically when the dynamic link library is initialized. However, if you are using a static library you must perform this step manually. See this article on the ipp*Init*() functions for more information on how to do this.

The following table lists all the architecture codes defined by the Intel IPP library through version 8.2 of the product. Note that some of these IPP architectures have been deprecated and are no longer supported in the current version of the product. Deprecated architectures are identified in the “Notes” column of the table.

Platform Architecture SIMD Requirements Processor / µarchitecture Notes
IA-32 px C optimized for all IA-32 processors i386+  
  w7 SSE2 P4, Xeon, Centrino  
  v8 Supplemental SSE3 Core 2, Xeon® 5100, Atom  
  p8 SSE4.1, SSE4.2, AES-NI Penryn, Nehalem, Westmere see notes below
  g9 AVX Sandy Bridge µarchitecture new since   IPP v.6.1
  h9  AVX2  Haswell µarchitecture  
Intel® 64 (EM64T) mx C-optimized for all Intel® 64 platforms P4 SSE2 minimum
  m7 SSE3 Prescott  
  u8 Supplemental SSE3 Core 2, Xeon® 5100, Atom  
  y8 SSE4.1, SSE4.2, AES-NI Penryn, Nehalem, Westmere see notes below
  e9 AVX Sandy Bridge µarchitecture new in 6.1
  l9 AVX2 Haswell µarchitecture  


For non-Intel based processors support, please see the article titled Use Intel® IPP on Intel or Compatible AMD* Processors.

P8/Y8 Internal Run-Time Dispatcher

Within the 32-bit p8 and equivalent 64-bit y8 architectures there is an additional "run-time" dispatching mechanism, a kind of mini-dispatcher. The Nehalem (Intel Core i7) and Westmere processor families add additional SIMD instructions beyond those defined by SSE4.1. The Nehalem processor family adds the SSE4.2 SIMD instructions and the Westmere family adds AES-NI.

Creating two additional internal versions of the IPP library for the SSE4.2 and AES-NI instructions would be very space inefficient, so they are bundled as part of the SSE4.1 library. When you call a function that includes, for example, AES-NI optimizations, an additional jump directs your call to the AES-NI version within the p8/y8 library. Because the enhancements affect the optimization of only a small number of IPP functions, this additional overhead occurs infrequently and only when your application is executing on a p8/y8 architecture processor.

Processor Architecture Table

The following table was copied from an Intel Compiler Pro options article describing some compiler architecture options. It contains a list of Intel processors showing which processors support which SIMD instructions. For the latest table please refer to the original article; it gets updated on a regular basis. Please note that the behavior of the Intel Compiler SIMD dispatcher described in that article does not apply to the Intel IPP library.

The Intel IPP library dispatching mechanism behaves differently than that found in the Intel Compiler products, and may also behave differently than other Intel library products.

Additional information regarding dispatching and how it relates to non-Intel processors can be found here. How to identify your specific processor is described here. To correlate a processor family name with an Intel CPU brand name, use the ark.intel.com web site.

SSE4.2

Intel® Core™ i7 processors
Intel® Core™ i5 processors
Intel® Core™ i3 processors
Intel® Xeon® 55XX series

SSE4.1
Intel® Xeon® 74XX series
Quad-Core Intel® Xeon 54XX, 33XX series
Dual-Core Intel® Xeon 52XX, 31XX series
Intel® Core™ 2 Extreme 9XXX series
Intel® Core™ 2 Quad 9XXX series
Intel® Core™ 2 Duo 8XXX series
Intel® Core™ 2 Duo E7200

SSSE3
Quad-Core Intel® Xeon® 73XX, 53XX, 32XX series
Dual-Core Intel® Xeon® 72XX, 53XX, 51XX, 30XX series
In tel® Core™ 2 Extreme 7XXX, 6XXX series
Intel® Core™ 2 Quad 6XXX series
Intel® Core™ 2 Duo 7XXX (except E7200), 6XXX, 5XXX, 4XXX series
Intel® Core™ 2 Solo 2XXX series
Intel® Pentium® dual-core processor E2XXX, T23XX series

SSE3
Dual-Core Intel® Xeon® 70XX, 71XX, 50XX Series
Dual-Core Intel® Xeon® processor (ULV and LV) 1.66, 2.0, 2.16
Dual-Core Intel® Xeon® 2.8
Intel® Xeon® processors with SSE3 instruction set support
Intel® Core™ Duo
Intel® Core™ Solo
Intel® Pentium® dual-core processor T21XX, T20XX series
Intel® Pentium® processor Extreme Edition
Intel® Pentium® D
Intel® Pentium® 4 processors with SSE3 instruction set support

SSE2
Intel® Xeon® processors
Intel® Pentium® 4 processors
Intel® Pentium® M

IA32
Intel® Pentium® III Processor
Intel® Pentium® II Processor
Intel® Pentium® Processor

Optimization Notice in English


*Other names and brands may be claimed as the property of others.

 

Einzelheiten zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.