ISA Extensions Intel AVX

ISA Extensions

Intel’s Instruction Set Architecture (ISA) continues to evolve and expand in functionality, enrich user experience, and create synergy across industries.


Intel® Advanced Vector Extensions (Intel® AVX)

The need for greater computing performance continues to grow across industry segments. To support rising demand and evolving usage models, we continue our history of innovation with the Intel® Advanced Vector Extensions (Intel® AVX) in products today.

Intel® AVX is a new-256 bit instruction set extension to Intel® SSE and is designed for applications that are Floating Point (FP) intensive. It was released early 2011 as part of the second generation Intel® Core™ processor family and is present in platforms ranging from notebooks to servers. Intel AVX improves performance due to wider vectors, new extensible syntax, and rich functionality. Intel AVX2 was released in 2013 with the fourth generation Intel® Core processor family and further extends the breadth of vector processing capability across floating-point and integer data domains. This results in higher performance and more efficient data management across a wide range of applications like image and audio/video processing, scientific simulations, financial analytics and 3D modeling and analysis.


Intel® Advanced Vector Extensions 512 (Intel® AVX-512)

In the future, some new products will feature a significant leap to 512-bit SIMD support. Programs can pack eight double precision and sixteen single precision floating numbers within the 512-bit vectors, as well as eight 64-bit and sixteen 32-bit integers. This enables processing of twice the number of data elements that IntelAVX/AVX2 can process with a single instruction and four times the capabilities of Intel SSE.

Intel AVX-512 instructions are important because they open up higher performance capabilities for the most demanding computational tasks. Intel AVX-512 instructions offer the highest degree of compiler support by including an unprecedented level of richness in the design of the instruction capabilities.

Intel AVX-512 features include 32 vector registers each 512-bit wide and eight dedicated mask registers. Intel AVX-512 is a flexible instruction set that includes support for broadcast, embedded masking to enable predication, embedded floating point rounding control, embedded floating-point fault suppression, scatter instructions, high speed math instructions, and compact representation of large displacement values.

Intel AVX-512 offers a level of compatibility with Intel AVX which is stronger than prior transitions to new widths for SIMD operations. Unlike Intel SSE and Intel AVX which cannot be mixed without performance penalties, the mixing of Intel AVX and Intel AVX-512 instructions is supported without penalty. Intel AVX registers YMM0–YMM15 map into Intel AVX-512 registers ZMM0–ZMM15 (in x86-64 mode), very much like Intel SSE registers map into Intel AVX registers. Therefore, in processors with Intel AVX-512 support, Intel AVX and Intel AVX2 instructions operate on the lower 128 or 256 bits of the first 16 ZMM registers.

More information about Intel AVX-512 instructions can be found in the blog "AVX-512 Instructions". The instructions are documented in the Intel® Architecture Instruction Set Extensions Programming Reference (PDF) (see the "Get Started" tab on this page).

Recipe: Building and Running MILC on Intel® Xeon® Processors and Intel® Xeon Phi™ Processors MILC software represents a set of codes written by the MIMD Lattice Computation collaboration used to study quantum chromodynamics, the theory of the strong interactions of subatomic physics. This article provides instructions for code access, build, and run directions for the “ks_imp_rhmc”...
Intel® Xeon Phi™ Processor 7200 Family Memory Management Optimizations This paper examines software performance optimization for an implementation of a non-library version of DGEMM executing on the Intel® Xeon Phi™ processor (code-named Knights Landing, with acronym KNL) running the Linux* Operating System (OS).
Achieve More With Intel® Xeon® E5 Optimized Software Bestselling author James C. Collins is well known for the oft-used axiom: “Good is the enemy of great.” In short, settling for what works is always going to keep you from doing your best. It’s a mentality that makes a lot of sense when considering software upgrades.
Implementing a masked SVML-like function explicitly in user defined way Intel Compiler provides SIMD intrinsics APIs for short vector math library (SVML) and starting with AVX512 generation it also exposes masked versions of SVML functions to the users. e.g. see zmmintrin.h: extern __m512d __ICL_INTRINCC _mm512_mask_exp_pd(__m512d, __mmask8, __m512d);
Exploring MPI for Python* on Intel® Xeon Phi™ Processor Learn how to write an MPI program in Python*, and take advantage of Intel® multicore architectures using OpenMP threads and Intel® AVX512 instructions.
Quick Analysis of Vectorization Using the Intel® Advisor 2017 Tool In this article we continue our exploration of vectorization on an Intel® Xeon Phi™ processor. We will discuss how to use the command-line interface in Intel® Advisor 2017 for a quick, initial analysis of loop performance that gives an overview of the hotspots in the code.
Thread Parallelism in Cython* Cython* is a superset of Python* that additionally supports C functions and C types on variable and class attributes. Cython generates C extension modules, which can be used by the main Python program using the import statement.
How AisaInfo ADB* Improves Performance with Intel® Xeon® Processor-Based Systems This article describes how AsiaInfo ADB was able to take advantage of features like Intel® Advanced Vector Extensions 2 and Intel® Transactional Synchronization Extensions as well as faster Intel® Solid State Drive hard disks to improve its performance when running on systems equipped with the...
Vectorization: The “Other” Parallelism You Need

We will describe, with C and Fortran examples, new opportunities for performance-enhancing vectorization provided by the Intel® AVX-512 instruction set on the processor code named Knights Landing. After an introduction, this will include vectorization of loops that compress or expand arrays;...

Intel® Xeon Phi™ Product Family x200 (KNL) User mode (ring 3) MONITOR and MWAIT The Intel® Xeon Phi™ Product Family x200 series processors (formerly known as “Knights Landing”) contain a model specific feature, which allows the MONITOR and MWAIT[1] instructions to be executed in rings other than ring 0, whereas architecturally these instructions are restricted to ring 0 (...