Understanding SIMD Optimization Layers and Dispatching in the Intel® IPP

This article describes the Intel® Integrated Performance Primitives (Intel® IPP) optimization layers present in the 8.2 * version of the library. The article titled Understanding CPU Dispatching in the Intel® IPP Library describes the same features for previous versions of the library (5.3 thru 6.1 **).

The standard distribution of the Intel IPP library contains multiple, functionally-identical, SIMD-specific, optimized libraries (or layers) that are automatically “dispatched” at run-time. The “dispatcher” directs your calls to the appropriate optimized library layer based on SIMD capabilities discovered during library initialization. This is done to maximize each function’s use of the runtime processor's underlying SIMD instructions and other architecture-specific features.

Note: you can build custom processor-specific libraries that do not require the dispatcher, but that is outside the scope of this article. Please read this IPP linkage models article for information on how to build custom versions of the IPP library.

Dispatching selects the Intel IPP optimized library layer that corresponds to the runtime CPU's SIMD instruction set. For example, on a Windows installation, the $(IPPROOT)\..\redist\intel64\ipp directory contains a file named ippih9-8.2.dll  which contains version ‘8.2’ of the optimized image processing libraries for processors that support the Intel AVX2 instructions on 64-bit processors; ‘ippi’ denotes the image processing domain, ‘h9’ denotes the AVX2 instructions set for 64-bit processors and ‘8.2’ denotes the library’s version number.

In the general case, the “dispatcher” identifies the run-time processor only once, at library initialization time, and sets up a variable internal to the library that directs your calls to the SIMD-specific functions that match the runtime processor. For example, ippsCopy_8u(), has multiple implementations stored in the library, with each version optimized to a specific SIMD instruction set. The h9_ippsCopy_8u() version of ippsCopy_8u() is called by the dispatcher when running on an Intel® Haswell processor in 64-bit addressing mode, because h9_ippsCopy_8u() is optimized for the AVX2 instruction set architecture supported by that processor in 64-bit addressing mode.

Initializing the IPP Dispatcher

Identifying the runtime processor and initializing the dispatcher should be the first action you take with the Intel IPP library. If you are using the standard dynamic link library this process is handled automatically when the Intel IPP shared library is initialized. If you are using a static library you must perform this step manually. See this article on the ipp*Init*() functions for more information on how to do this.

Because the minimum SIMD instruction set is SSE on IA-32 and Intel 64 processors it is recommended that you ALWAYS call the the ippInit() function before making any other calls to the Intel IPP library. This advice applies regardless of whether you are linking against the static or dynamic form of the library (even though the dynamic library will also perform this call).

Calling the 
ippInit() function with the shared libraries (DLL and SO) will generate an error message to a dialog box or error console if the ippInit() function detects that the runtime CPU is not supported by the Intel IPP library. Calling the ippInit() function in the static versions of the library will not generate a console or dialog message. Both versions of the ippInit() function will return an error code when a non-supported CPU is detected.

It is important that you call the ippInit() function at the beginning of your application to insure that the processor on which your application is running will support the Intel IPP library. If the ippInit() function returns an error code you should close your application gracefully in order to avoid an unexpected termination of your application by an invalid instruction fault because your application is running on an unsupported processor.

The following table lists the SIMD architecture codes supported by the Intel IPP library.

The following table lists the SIMD architecture codes supported by the Intel IPP library.

Platform

Architecture

SIMD Requirements

Processor / µarchitecture

IA-32

w7

SSE2

P4, Xeon, Centrino

 

v8

Supplemental SSE3

Core 2, Xeon® 5100, Atom

 

s8

Supplemental SSE3 (compiled for Atom)

Atom

 

p8

SSE4.1, SSE4.2 and AES-NI

Penryn, Nehalem, Westmere

 

g9

AVX

Sandy Bridge µarchitecture

 

h9

AVX2

Haswell µarchitecture

Intel® 64 (EM64T)

m7

SSE3

Prescott

 

u8

Supplemental SSE3

Core 2, Xeon® 5100, Atom

 

n8

Supplemental SSE3 (compiled for Atom)

Atom

 

y8

SSE4.1, SSE4.2, AES-NI

Penryn, Nehalem, Westmere

 

e9

AVX

Sandy Bridge µarchitecture

 

l9

AVX2

Haswell µarchitecture


For non-Intel based processors support, please read Use Intel® IPP on Intel or Compatible AMD* Processors.

P8/Y8 Internal Run-Time Dispatcher

Within the 32-bit p8 and equivalent 64-bit y8 architectures there is an additional "runtime dispatcher," a mini-dispatcher. The Nehalem and Westmere processor microarchitectures add additional SIMD instructions beyond those defined by SSE4.1. The Nehalem processor microarchitecture added SSE4.2 SIMD instructions and the Westmere processor microarchitecture added Inte® AES-NI.

Creating two separate optimization layers within the IPP library for the small set of instructions added by SSE4.2 and AES-NI would be very space inefficient, so they are bundled into the SSE4.1 library (p8/y8) as minor variants to that optimization layer. When you call a function that includes, for example, AES-NI optimizations, an additional jump directs your call to the AES-NI version within the p8/y8 library if your runtime processor supports these instructions. Because the enhancements affect the optimization of only a small number of Intel IPP functions, this additional overhead occurs infrequently and only when your application is executing on a p8/y8 architecture processor that supports these extra instructions.

S8/N8 (Atom) Dispatch

Unlike preceding versions of the library, the 7.0 version of the Intel IPP library does include Atom-optimized variants of the library within all formats (static and dynamic) of the library. For this reason, the Linux distribution of the 7.0 version of the Intel IPP library no longer includes a separate Atom-specific version of the library, since Atom-specific optimizations have been fully merged into all formats of the standard library files.

Please read Intel® Atom™ Processors Support in the Intel® Integrated Performance Primitives (Intel® IPP) Library for more information regarding Atom optimizations in the IPP library.

Processor Architecture Table

The following table was copied from an Intel Compiler Pro options article describing some compiler architecture options. It contains a list of Intel processors showing which processors support which SIMD instructions. For the latest table please refer to the original article; it gets updated on a regular basis. Please note that the behavior of the Intel Compiler SIMD dispatcher described in that article does not apply to the Intel IPP library.

The Intel IPP library dispatching mechanism behaves differently than that found in the Intel Compiler products, and may also behave differently than other Intel library products.

Additional information regarding dispatching and how it relates to non-Intel processors can be found here. How to identify your specific processor is described here. To correlate a processor family name with an Intel CPU brand name, use the following web site: ark.intel.com.

AVX2

4th Generation Intel® Core™ Processors
Intel® Xeon® Processor E3 v3 Family

AVX

3rd Generation Intel® Core™ i7 Processors
3rd Generation Intel® Core™ i5 Processors
3rd Generation Intel® Core™ i3 Processors
Intel® Xeon® Processor E7 v2 Family
Intel® Xeon® Processor E5 v2 Family
Intel® Xeon® Processor E3 v2 Family
2nd Generation Intel® Core™ i7 Processors
2nd Generation Intel® Core™ i5 Processors
2nd Generation Intel® Core™ i3 Processors
Intel® Xeon® Processor E5 Family
Intel® Xeon® Processor E3 Family

 

SE4.2
Intel® Core™ i7 processors
Intel® Core™ i5 processors
Intel® Core™ i3 processors
Intel® Xeon® 55XX series

SSE4.1
Intel® Xeon® 74XX series
Quad-Core Intel® Xeon 54XX, 33XX series
Dual-Core Intel® Xeon 52XX, 31XX series
Intel® Core™ 2 Extreme 9XXX series
Intel® Core™ 2 Quad 9XXX series
Intel® Core™ 2 Duo 8XXX series
Intel® Core™ 2 Duo E7200

SSSE3
Quad-Core Intel® Xeon® 73XX, 53XX, 32XX series
Dual-Core Intel® Xeon® 72XX, 53XX, 51XX, 30XX series
In tel® Core™ 2 Extreme 7XXX, 6XXX series
Intel® Core™ 2 Quad 6XXX series
Intel® Core™ 2 Duo 7XXX (except E7200), 6XXX, 5XXX, 4XXX series
Intel® Core™ 2 Solo 2XXX series
Intel® Pentium® dual-core processor E2XXX, T23XX series

SSE3
Dual-Core Intel® Xeon® 70XX, 71XX, 50XX Series
Dual-Core Intel® Xeon® processor (ULV and LV) 1.66, 2.0, 2.16
Dual-Core Intel® Xeon® 2.8
Intel® Xeon® processors with SSE3 instruction set support
Intel® Core™ Duo
Intel® Core™ Solo
Intel® Pentium® dual-core processor T21XX, T20XX series
Intel® Pentium® processor Extreme Edition
Intel® Pentium® D
Intel® Pentium® 4 processors with SSE3 instruction set support

SSE2
Intel® Xeon® processors
Intel® Pentium® 4 processors
Intel® Pentium® M

IA32
Intel® Pentium® III Processor
Intel® Pentium® II Processor
Intel® Pentium® Processor

Notes:

* The latest version of Intel(R) IPP is version 8.2 update 2 bundled with Intel(R) Composer XE 2015 ( released at September 2014 )

** IPP versions 5.3, 6.* and 7.* are no longer supported

Optimization Notice in English


*Other names and brands may be claimed as the property of others.

 

Для получения подробной информации о возможностях оптимизации компилятора обратитесь к нашему Уведомлению об оптимизации.