Intel® Advanced Vector Extensions 2 (Intel® AVX2) optimization in Intel® Math Kernel Library (Intel® MKL)

Published:06/28/2012   Last Updated:06/28/2012

Haswell is the codename next generation x86 processor micro architecture (tock). This architecture is expected in 2013. Haswell's new instructions accelerate a broad category of applications and usage models. Download the full Intel® Advanced Vector Extensions (Intel® AVX) Programming Reference (319433). This new instruction set is built upon the instructions of Intel® microarchitecture code name Ivy Bridge, including the digital random number generator, half-float (float16) accelerators, and an extended set of Intel® AVX instructions.
The instructions fit into the following categories:

Intel® Advanced Vector Extensions 2 (Intel® AVX2) - Integer data types expanded to 256-bit SIMD. Intel AVX2's integer support is particularly useful for processing visual data commonly encountered in consumer imaging and video processing workloads. With Haswell, we have both Intel AVX for floating point data types as well as Intel AVX2 for integer data types.

Bit manipulation instructions are useful for compressed databases, hashes, large number arithmetic, and a variety of general purpose codes.

Gather instructions are useful for vectorized code that accesses non-adjacent data elements. Haswell gather operations are mask-based for safety (like conditional loads and stores introduced in Intel AVX). Gather operations are favorable to clip values, to clamp boundaries, or similar conditional operations.

Any-to-Any permutes are incredibly useful shuffle operations. Haswell adds support for DWORD and QWORD granularity and allows to permute across an entire 256-bit register.

Vector-Vector Shifts are added to shift vectors where the amount of shift is controlled by vector. These are critical in vectorized loops with variable shifts.

Floating Point Multiply Accumulate - Our floating-point multiply accumulate significantly increases peak flops and provides improved precision to further improve transcendental mathematics. They are broadly usable in high performance computing, professional quality imaging, and face detection. They operate on scalars, 128-bit packed single and double precision data types, and 256-bit packed single and double-precision data types. [These instructions were described previously, in the initial Intel AVX specification].


Intel MKL 11.0 is fully supporting Intel AVX2; more optimizations are available in the following functions.

Basic Linear Algebra Subprograms ( BLAS)








• xHER2K





Discrete Fourier transform ( DFT):

• 1D, power-of-2

• 2D, power-of-2

• 3D, power-of-2

• 1D, non-power-of-2

• 2D, non-power-of-2

• 3D, non-power-of-2

Sparse BLAS

• dcsrmm

• scsrmm

• dcoomm

• scoomm

Vector Statistical Library (VSL)
• MRG32k3a


Intel® AVX optimization in Intel® MKL

Haswell New Instruction Descriptions Now Available!

Intel® Advanced Vector Extensions Programming Reference

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804