Haswell New Instruction Descriptions Now Available!

By Mark J Buxton,

Published:06/13/2011   Last Updated:06/13/2011

Intel just released public details on the next generation of the x86 architecture. Arriving first in our 2013 Intel microarchitecture codename “Haswell”, the new instructions accelerate a broad category of applications and usage models. Download the full Intel® Advanced Vector Extensions Programming Reference (319433-011).

These build upon the instructions coming in Intel® microarchitecture code name Ivy Bridge, including the digital random number generator, half-float (float16) accelerators, and extend the Intel® Advanced Vector extensions (Intel® AVX) that launched in 2011.

The instructions fit into the following categories:

AVX2 - Integer data types expanded to 256-bit SIMD. AVX2’s integer support is particularly useful for processing visual data commonly encountered in consumer imaging and video processing workloads. With Haswell, we have both Intel® Advanced Vector Extensions (Intel® AVX) for floating point, and AVX2 for integer data types.

AVX2 - Integer data types .

Bit manipulation instructions are useful for compressed database, hashing , large number arithmetic, and a variety of general purpose codes.

Bit manipulation instructions

Gather Useful for vectorizing codes with nonadjacent data elements. Haswell gathers are masked for safety, (like the conditional loads and stores introduced in Intel® AVX) , which favors their use in codes with clipping or other conditionals.

Haswell gathers are masked for safety,

Any-to-Any permutes – incredibly useful shuffling operations. Haswell adds support for DWORD and QWORD granularity permutes across an entire 256-bit register.

Any-to-Any permute

Vector-Vector Shifts: We added shifts with the vector shift controls. These are critical in vectorizing loops with variable shifts.

Vector-Vector Shifts

Floating Point Multiply Accumulate – Our floating-point multiply accumulate significantly increases peak flops and provides improved precision to further improve transcendental mathematics. They are broadly usable in high performance computing, professional quality imaging, and face detection. They operate on scalar, 128-bit packed single and double precision data types, and 256-bit packed single and double-precision data types. [These instructions were described previously, in the initial Intel® AVX specification].

Floating Point Multiply Accumulate

The vector instructions build upon the expanded (256-bit) register state added in Intel® AVX, and as such as supported by any operating system that supports Intel® AVX.
For developers, please note that the instructions span multiple CPUID leaves. You should be careful to check all applicable bits before using these instructions.
Please check out the specification and stay tuned for supporting tools over the next couple of months.

Mark Buxton
Software Engineer
Intel Corporation

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804