ISA Extensions Intel AVX

ISA Extensions

Intel’s Instruction Set Architecture (ISA) continues to evolve and expand in functionality, enrich user experience, and create synergy across industries.


Intel® Advanced Vector Extensions (Intel® AVX)

The need for greater computing performance continues to grow across industry segments. To support rising demand and evolving usage models, we continue our history of innovation with the Intel® Advanced Vector Extensions (Intel® AVX) in products today.

Intel® AVX is a new-256 bit instruction set extension to Intel® SSE and is designed for applications that are Floating Point (FP) intensive. It was released early 2011 as part of the second generation Intel® Core™ processor family and is present in platforms ranging from notebooks to servers. Intel AVX improves performance due to wider vectors, new extensible syntax, and rich functionality. Intel AVX2 was released in 2013 with the fourth generation Intel® Core processor family and further extends the breadth of vector processing capability across floating-point and integer data domains. This results in higher performance and more efficient data management across a wide range of applications like image and audio/video processing, scientific simulations, financial analytics and 3D modeling and analysis.


Intel® Advanced Vector Extensions 512 (Intel® AVX-512)

In the future, some new products will feature a significant leap to 512-bit SIMD support. Programs can pack eight double precision and sixteen single precision floating numbers within the 512-bit vectors, as well as eight 64-bit and sixteen 32-bit integers. This enables processing of twice the number of data elements that IntelAVX/AVX2 can process with a single instruction and four times the capabilities of Intel SSE.

Intel AVX-512 instructions are important because they open up higher performance capabilities for the most demanding computational tasks. Intel AVX-512 instructions offer the highest degree of compiler support by including an unprecedented level of richness in the design of the instruction capabilities.

Intel AVX-512 features include 32 vector registers each 512-bit wide and eight dedicated mask registers. Intel AVX-512 is a flexible instruction set that includes support for broadcast, embedded masking to enable predication, embedded floating point rounding control, embedded floating-point fault suppression, scatter instructions, high speed math instructions, and compact representation of large displacement values.

Intel AVX-512 offers a level of compatibility with Intel AVX which is stronger than prior transitions to new widths for SIMD operations. Unlike Intel SSE and Intel AVX which cannot be mixed without performance penalties, the mixing of Intel AVX and Intel AVX-512 instructions is supported without penalty. Intel AVX registers YMM0–YMM15 map into Intel AVX-512 registers ZMM0–ZMM15 (in x86-64 mode), very much like Intel SSE registers map into Intel AVX registers. Therefore, in processors with Intel AVX-512 support, Intel AVX and Intel AVX2 instructions operate on the lower 128 or 256 bits of the first 16 ZMM registers.

More information about Intel AVX-512 instructions can be found in the blog "AVX-512 Instructions". The instructions are documented in the Intel® Architecture Instruction Set Extensions Programming Reference (PDF) (see the "Get Started" tab on this page).

Putting Your Data and Code in Order: Data and layout - Part 2 In this pair of articles on performance and memory covers basic concepts to provide guidance to developers seeking to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as...
Putting Your Data and Code in Order: Optimization and Memory – Part 1 This series of two articles discusses how data and memory layout affect performance and suggests specific steps to improve software performance. The basic steps shown in these two articles can yield significant performance gains. These two articles are designed at an intermediate level. It is...
Software Occlusion Culling Figure 1
Software Occlusion Culling This article details an algorithm and associated sample code for software occlusion culling which is available for download. The technique divides scene objects into occluders and occludees and culls occludees based on a depth comparison with the occluders that are software rasterized to the depth...
Fast Computation of Fletcher Checksums Checksums are widely used for checking the integrity of data in applications such as storage and networking. We present fast methods of computing checksums on Intel® processors. Instead of computing the checksum of the input with a traditional linear method, we describe a faster method to split the...
Compiling for the Intel® Xeon Phi™ processor x200 and the Intel® AVX-512 ISA Introduction This document briefly gives an overview of the Intel® Advanced Vector Extensions 512 (Intel® AVX-512) and shows different ways to build an application for the Intel® Xeon Phi™ processor x200 using the Intel® compiler. Intel® AVX-512 Family of Instructions
Case Study: Optimized Code for Neural Cell Simulations Intel held the Intel® Modern Code Developer Challenge that had about 2,000 students from 130 universities in 19 countries registered to participate in the Challenge. They were provided access to Intel® Xeon Phi™ coprocessors to optimize code used in a CERN openlab brain simulation research project...
Three Pieces of Advice for Code Modernization Success What three code modernization techniques would I suggest to help a programmer improve the execution performance of her code? With too many specific things to choose from, these are three recommendations for any programmer anywhere and anytime.
Reference Implementations for Intel® Architecture Approximation Instructions VRCP14, VRSQRT14, VRCP28, VRSQRT28, and VEXP2
We are providing source files containing reference implementations for the scalar versions of 10 approximation instructions introduced in the "Intel® Architecture Instruction Set Extensions Programming Reference" document
High-Performance, Modern Code Optimizations for Computational Fluid Dynamics Modern server farms consist of a large number of heterogeneous, energy-efficient, and very high-performance computing nodes connected with each other through a high-bandwidth network interconnect.  Such systems pose one of the biggest challenges for engineers and scientists today:  how to solve...
Get a Helping Hand from the Vectorization Advisor Vectorization Advisor is like having a trusted friend look over your code and give you advice based on what he sees. As you’ll see in this article, user feedback on the tool has included, “there are significant speedups produced by following advisor output, I'm already sold on this tool!”