ISA Extensions

Intel’s Instruction Set Architecture (ISA) continues to evolve and expand in functionality, enrich user experience, and create synergy across industries.

INTEL® AVX

Intel® Advanced Vector Extensions (Intel® AVX)

The need for greater computing performance continues to grow across industry segments. To support rising demand and evolving usage models, we continue our history of innovation with the Intel® Advanced Vector Extensions (Intel® AVX) in products today.

Intel® AVX is a new-256 bit instruction set extension to Intel® SSE and is designed for applications that are Floating Point (FP) intensive. It was released early 2011 as part of the second generation Intel® Core™ processor family and is present in platforms ranging from notebooks to servers. Intel AVX improves performance due to wider vectors, new extensible syntax, and rich functionality. Intel AVX2 was released in 2013 with the fourth generation Intel® Core processor family and further extends the breadth of vector processing capability across floating-point and integer data domains. This results in higher performance and more efficient data management across a wide range of applications like image and audio/video processing, scientific simulations, financial analytics and 3D modeling and analysis.

 

Intel® Advanced Vector Extensions 512 (Intel® AVX-512)

In the future, some new products will feature a significant leap to 512-bit SIMD support. Programs can pack eight double precision and sixteen single precision floating numbers within the 512-bit vectors, as well as eight 64-bit and sixteen 32-bit integers. This enables processing of twice the number of data elements that IntelAVX/AVX2 can process with a single instruction and four times the capabilities of Intel SSE.

Intel AVX-512 instructions are important because they open up higher performance capabilities for the most demanding computational tasks. Intel AVX-512 instructions offer the highest degree of compiler support by including an unprecedented level of richness in the design of the instruction capabilities.

Intel AVX-512 features include 32 vector registers each 512-bit wide and eight dedicated mask registers. Intel AVX-512 is a flexible instruction set that includes support for broadcast, embedded masking to enable predication, embedded floating point rounding control, embedded floating-point fault suppression, scatter instructions, high speed math instructions, and compact representation of large displacement values.

Intel AVX-512 offers a level of compatibility with Intel AVX which is stronger than prior transitions to new widths for SIMD operations. Unlike Intel SSE and Intel AVX which cannot be mixed without performance penalties, the mixing of Intel AVX and Intel AVX-512 instructions is supported without penalty. Intel AVX registers YMM0–YMM15 map into Intel AVX-512 registers ZMM0–ZMM15 (in x86-64 mode), very much like Intel SSE registers map into Intel AVX registers. Therefore, in processors with Intel AVX-512 support, Intel AVX and Intel AVX2 instructions operate on the lower 128 or 256 bits of the first 16 ZMM registers.

More information about Intel AVX-512 instructions can be found in the blog "AVX-512 Instructions". The instructions are documented in the Intel® Architecture Instruction Set Extensions Programming Reference (PDF) (see the "Get Started" tab on this page).

借助英特尔® 高级向量扩展指令集 512 加快深度学习速度
04/08/19

了解英特尔® 高级向量扩展指令集 512 如何在英特尔® 至强® 可扩展处理器中加快深度学习的速度。

Intel and Facebook* collaborate to boost PyTorch* CPU performance Intel's software optimization and 2nd generation Intel® Xeon® Scalable Processors with Intel® DL Boost® accelerate PyTorch's CPU performance
我们可以从英特尔® SPMD 程序编译器中学到什么? 我们已经添加了一个简单的全新SGEMM 示例至英特尔® SPMD 程序编译器 GitHub* 库。英特尔® SPMD 程序编译器俗称为“ISPC”,因为编译器的可执行名称为“ispc.exe”。全新 SGEMM 示例可帮助展示在 ISPC 中优化计算的几种方法。一般而言,单精度一般矩阵乘法 (SGEMM) 是一个良好且简洁的示例,许多程序员对它都非常熟悉。拥有 SGEMM 的 ISPC 版本有助于与其他编程语言及其优化 SGEMM 代码的方法进行比较。
Boost Deep Learning with Intel® Advanced Vector Extensions 512
03/29/19

Learn how Intel® Advanced Vector Extensions 512 can accelerate deep learning within Intel® Xeon® Scalable processors.

Getting Started with Intel® Optimization for PyTorch* on Second Generation Intel® Xeon® Scalable Processors Accelerate deep learning PyTorch* code on second generation Intel® Xeon® Scalable processor with Intel® Deep Learning Boost.
Extending Deep Learning Reference Stack Capabilities Intel understands the challenges that come with creating and deploying applications for deep learning-based workloads.
Maximize TensorFlow* Performance on CPU: Considerations and Recommendations for Inference Workloads This article will describe performance considerations for CPU inference using Intel® Optimization for TensorFlow*
What Can We Learn from the Intel SPMD Program Compiler? We have added a new simple SGEMM example to the Intel® SPMD Program Compiler GitHub* repo. The Intel® SPMD Program Compiler is colloquially referred to as “ISPC”, as in the compiler’s executable name “ispc.exe”. The new SGEMM sample is instructive for showing several variants of how to approach...
Intel® Math Kernel Library Improved Small Matrix Performance Using Just-in-Time (JIT) Code Generation for Matrix Multiplication (GEMM)     The most commonly used and performance-critical Intel® Math Kernel Library (Intel® MKL) functions are the general matrix multiply (GEMM) functions.
Intel® System Studio Release Notes, System Requirements, and What's New This page provides system requirements and release notes for Intel® System Studio. They are are categorized by year, from the newest to oldest, with individual releases listed within each year. For component-specific system requirements and release notes, please see the Release notes for individual...