Understanding CPU Dispatching in the Intel® IPP Libraries

Introduction

The Intel® Integrated Performance Primitives (Intel® IPP) is a cross-architecture software library that provides a broad range of library functions for image processing, signal processing, data compression, cryptography, and computer vision, as well as math support routines for such processing capabilities. Intel® IPP is optimized for the wide range of Intel microprocessors.

One of the key advantages within Intel® IPP is performance. The performance advantage comes through per processor architecture optimized functions, compiled into one single library. Intel® IPP functions are “dispatched” at run-time. The “dispatcher” chooses which of these processor-specific optimized libraries to use when the application makes a call into the IPP library. This is done to maximize each function’s use of the underlying vector instructions and other architecture-specific features.

This paper covers CPU dispatching of the Intel® IPP library in more detail. After reading this article you will understand how CPU dispatching works and which libraries are needed for which processor architecture. Further documentation on Intel® IPP can be found at Intel® Integrated Performance Primitives – Documentation.


Dispatcher

Dispatching refers to the process of detecting CPU features at run-time and then selecting the Intel® IPP optimized library set that corresponds to your CPU. For example, in the <ipp directory>\ia32\ipp directory, the ippip8.dll library file contains the 32-bit optimized image processing libraries for processors with Intel® SSE4.2; ‘ippi’ refers to the image processing library, ‘p8’ refers to 32-bit SSE4.2 architecture.

Note: You can build custom processor-specific libraries that do not require the dispatcher, but that is outside thescope of this article. Please read this IPP linkage models article for information on how to build custom versions of the Intel® IPP library.

In the general case, the “dispatcher” identifies the run-time processor only once, at library initialization time. It sets an internal table or variable that directs your calls to the internal functions that match your architecture. For example, ippsCopy_8u(), may have multiple implementations stored in the library, with each version optimized to a specific Intel® processor architecture. Thus, the p8_ippsCopy_8u() version of ippsCopy_8u() is called by dispatcher when running on an Intel processor with Intel® SSE4.2 on IA-32, because it is optimized for this processor architecture.

Note: IPP architectures generally correspond to SIMD (MMX, SSE, AES, etc.) instructions sets.


Initializing the IPP Dispatcher

The process of identifying the specific processor being used, and initialization of the dispatcher, should be performed before making any calls into the IPP library. If you are using a dynamic link library this process is handled automatically when the dynamic link library is initialized. However, if you are using a static library you must perform this step manually. See this article on the ipp*Init*() functions for more information on how to do this.

The following table lists all the architecture codes defined by the Intel® IPP library through version 8.2 of the product. Note that some of these IPP architectures have been deprecated and are no longer supported in the current version of the product. Deprecated architectures are identified in the “Notes” column of the table.

IA-32 Intel® architecture

Intel® 64 architecture

Meaning

px

mx

Generic code optimized for processors with Intel® Streaming SIMD Extensions (Intel® SSE)

w7

my

Optimized for processors with Intel SSE2

s8

n8

Optimized for processors with Supplemental Streaming SIMD Extensions 3 (SSSE3)

-

m7

Optimized for processors with Intel SSE3

p8

y8

Optimized for processors with Intel SSE4.2

g9

e9

Optimized for processors with Intel® Advanced Vector Extensions (Intel® AVX) and Intel® Advanced Encryption Standard New Instructions (Intel® AES-NI)

h9

l9

Optimized for processors with Intel® Advanced Vector Extensions 2 (Intel® AVX2)

-

k0

Optimized for processors with Intel® Advanced Vector Extensions 512 (Intel® AVX-512)

 

n0

Optimized for processors with Intel® Advanced Vector Extensions 512 (Intel® AVX-512) for Intel® Many Integrated Core Architecture (Intel® MIC Architecture)

Table 1: CPU Identification Codes Associated with Processor-Specific Libraries

For non-Intel based processors support, please see the article titled Use Intel® IPP on Intel or Compatible AMD* Processors.


P8/Y8 Internal Run-Time Dispatcher

Within the 32-bit 'p8' and equivalent 64-bit 'y8' architectures there is an additional "run-time" dispatching mechanism, a kind of mini-dispatcher. The Nehalem (Intel® Core i7) and Westmere processor families add additional SIMD instructions beyond those defined by SSE4.1. The Nehalem processor family adds the SSE4.2 SIMD instructions and the Westmere family adds AES-NI.

Creating two additional internal versions of the IPP library for the SSE4.2 and AES-NI instructions would be very space inefficient, so they are bundled as part of the SSE4.1 library. When you call a function that includes, for example, AES-NI optimizations, an additional jump directs your call to the AES-NI version within the p8/y8 library. Because the enhancements affect the optimization of only a small number of IPP functions, this additional overhead occurs infrequently and only when your application is executing on a p8/y8 architecture processor.


Processor Architecture Table

The following table was copied from an Intel® Compiler Options for Intel® SSE and Intel® AVX generation (SSE2, SSE3, SSSE3, ATOM_SSSE3, SSE4.1, SSE4.2, ATOM_SSE4.2, AVX, AVX2, AVX-512) and processor-specific optimizations article describing some compiler architecture options. It contains a list of Intel processors showing which processors support which vector instructions. For the latest table please refer to the original article; it gets updated on a regular basis. Please note that the behavior of the Intel Compiler SIMD dispatcher described in that article does not apply to the Intel® IPP library.

Note: The Intel® IPP library dispatching mechanism behaves different than the one in the Intel Compiler products, and may also behave different than other Intel library products.

Additional information regarding dispatching and how it relates to non-Intel processors can be found here. How to identify your specific processor is described here. To correlate a processor family name with an Intel CPU brand name, use the ark.intel.com web site.

COMMON-AVX512 A future Intel® Processor. 
MIC-AVX512 The Intel® Xeon Phi™ processor x200 product family.
CORE-AVX512 A future Intel® Processor
CORE-AVX2

4th Generation Intel® Core™ Processors
5th Generation Intel® Core™ Processors
6th Generation Intel® Core™ Processors
Intel® Xeon® Processor E7 v3 Family
Intel® Xeon® Processor E5 v3 Family
Intel® Xeon® Processor E3 v3 Family
Intel® Xeon® Processor E7 v4 Family
Intel® Xeon® Processor E5 v4 Family
Intel® Xeon® Processor E3 v4 Family

CORE-AVX-I 3rd Generation Intel® Core™ i7 Processors
3rd Generation Intel® Core™ i5 Processors
3rd Generation Intel® Core™ i3 Processors
Intel® Xeon® Processor E7 v2 Family
Intel® Xeon® Processor E5 v2 Family
Intel® Xeon® Processor E3 v2 Family
AVX 2nd Generation Intel® Core™ i7 Processors
2nd Generation Intel® Core™ i5 Processors
2nd Generation Intel® Core™ i3 Processors
Intel® Xeon® Processor E5 Family
Intel® Xeon® Processor E3 Family
SSE4.2 Previous Generation Intel® Core™ i7 Processors
Previous Generation Intel® Core™ i5 Processors
Previous Generation Intel® Core™ i3 Processors
Intel® Xeon® 55XX series
Intel® Xeon® 56XX series
Intel® Xeon® 75XX series
Intel® Xeon® Processor E7 Family
ATOM_SSE4.2 Intel® Atom™ processors that support Intel® SSE4.2 instructions.
SSE4.1 Intel® Xeon® 74XX series
Quad-Core Intel® Xeon 54XX, 33XX series
Dual-Core Intel® Xeon 52XX, 31XX series
Intel® Core™ 2 Extreme 9XXX series
Intel® Core™ 2 Quad 9XXX series
Intel® Core™ 2 Duo 8XXX series
Intel® Core™ 2 Duo E7200
SSSE3 Quad-Core Intel® Xeon® 73XX, 53XX, 32XX series
Dual-Core Intel® Xeon® 72XX, 53XX, 51XX, 30XX series
Intel® Core™ 2 Extreme 7XXX, 6XXX series
Intel® Core™ 2 Quad 6XXX series
Intel® Core™ 2 Duo 7XXX (except E7200), 6XXX, 5XXX, 4XXX series
Intel® Core™ 2 Solo 2XXX series
Intel® Pentium® dual-core processor E2XXX, T23XX series
ATOM_SSSE3 Intel® Atom™ processors
SSE3 Dual-Core Intel® Xeon® 70XX, 71XX, 50XX Series
Dual-Core Intel® Xeon® processor (ULV and LV) 1.66, 2.0, 2.16
Dual-Core Intel® Xeon® 2.8
Intel® Xeon® processors with SSE3 instruction set support
Intel® Core™ Duo
Intel® Core™ Solo
Intel® Pentium® dual-core processor T21XX, T20XX series
Intel® Pentium® processor Extreme Edition
Intel® Pentium® D
Intel® Pentium® 4 processors with SSE3 instruction set support
SSE2 Intel® Xeon® processors
Intel® Pentium® 4 processors
Intel® Pentium® M
IA32 Intel® Pentium® III Processor
Intel® Pentium® II Processor
Intel® Pentium® Processor

Table 2:  Intel Processors Associated with Specific CPU Vector Instructions


* Other names and brands may be claimed as the property of others.

Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

 

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Copyright © 2002-2016, Intel Corporation. All rights reserved.

For more complete information about compiler optimizations, see our Optimization Notice.