Intel® Developer Zone:
Intel Instruction Set Architecture Extensions

Intel’s Instruction Set Architecture (ISA) continues to evolve to improve functionality, performance and the user experience. Featured below are planned extensions to the ISA that are new as well as those being planned for enhancements in future generations of processors. By publishing these extensions early, Intel helps ensure that the software ecosystem has time to innovate and come to market with enhanced and new products when the processors are launched.


Tools & Downloads

  • Intel® C++ Compiler

    The Intel® C++ Compiler is available for download from the Intel® Registration Center for all licensed customers. Evaluation versions of Intel® Software Development Products are also available for free download.

  • Intel Intrinsics Guide

    The Intel Intrinsics Guide is an interactive reference tool for Intel intrinsic instructions, which are C style functions that provide access to many Intel instructions – including Intel® Streaming SIMD Extensions (Intel® SSE), Intel® Advanced Vector Extensions (Intel® AVX), and more – without the need to write assembly code.

Intel® Advanced Vector Extensions (Intel® AVX)

The need for greater computing performance continues to grow across industry segments. To support rising demand and evolving usage models, we continue our history of innovation with the Intel® Advanced Vector Extensions (Intel® AVX) in products today.

Intel® AVX is a new-256 bit instruction set extension to Intel® SSE and is designed for applications that are Floating Point (FP) intensive. It was released early 2011 as part of the Intel® microarchitecture code name Sandy Bridge processor family and is present in platforms ranging from notebooks to servers. Intel AVX improves performance due to wider vectors, new extensible syntax, and rich functionality. This results in better management of data and general purpose applications like image, audio/video processing, scientific simulations, financial analytics and 3D modeling and analysis.

Intel® Advanced Vector Extensions 512 (Intel® AVX-512)

In the future, some new products will feature a significant leap to 512-bit SIMD support. Programs can pack eight double precision and sixteen single precision floating numbers within the 512-bit vectors, as well as eight 64-bit and sixteen 32-bit integers. This enables processing of twice the number of data elements that IntelAVX/AVX2 can process with a single instruction and four times the capabilities of Intel SSE.

Intel AVX-512 instructions are important because they open up higher performance capabilities for the most demanding computational tasks. Intel AVX-512 instructions offer the highest degree of compiler support by including an unprecedented level of richness in the design of the instruction capabilities.

Intel AVX-512 features include 32 vector registers each 512-bit wide and eight dedicated mask registers. Intel AVX-512 is a flexible instruction set that includes support for broadcast, embedded masking to enable predication, embedded floating point rounding control, embedded floating-point fault suppression, scatter instructions, high speed math instructions, and compact representation of large displacement values.

Intel AVX-512 offers a level of compatibility with Intel AVX which is stronger than prior transitions to new widths for SIMD operations. Unlike Intel SSE and Intel AVX which cannot be mixed without performance penalties, the mixing of Intel AVX and Intel AVX-512 instructions is supported without penalty. Intel AVX registers YMM0–YMM15 map into Intel AVX-512 registers ZMM0–ZMM15 (in x86-64 mode), very much like Intel SSE registers map into Intel AVX registers. Therefore, in processors with Intel AVX-512 support, Intel AVX and Intel AVX2 instructions operate on the lower 128 or 256 bits of the first 16 ZMM registers.

More information about the details about Intel AVX-512 instructions can be found in the blog "AVX-512 Instructions". The instructions are documented in the Intel® Architecture Instruction Set Extensions Programming Reference (see the "Overview" tab on this page).

How to Compile for Intel® AVX
By Martyn Corden (Intel)Posted 08/02/20122
Use the Intel Compiler 11.1 or 12.0 with the switch /QxAVX (Windows*) or -xavx (Linux*) to compile applications for Intel® Advanced Vector Extensions (Intel® AVX).
Intel® AVX2 optimization in Intel® MKL
By Naveen Gv (Intel)Posted 06/28/20120
The specific optimization and general support for the latest Intel® AVX2 instructions have been added in the Intel MKL v11.0. This article lists the specific functions that are optimized for Intel AVX2.
Intel IPP support for Intel® AVX2
By adminPosted 06/20/20122
List of Intel IPP functions optimized for processor code name Haswell
Intel® Software Development Emulator Release Notes
By Ady Tal (Intel)Posted 06/15/20120
Release notes for the Intel® Software Development Emulator
Subscribe to Intel Developer Zone Articles
No content found
Subscribe to Intel Developer Zone Blogs

    Intel® Software Guard Extensions (Intel® SGX)

    Intel Vision Statement

    Computing workloads today are increasing in complexity, with hundreds of software modules delivered by different teams scattered across the world. Isolation of workloads on open platforms has been an ongoing effort, beginning with protected mode architecture to create a privilege separation between operating systems and applications. Recent malware attacks however have demonstrated the ability to penetrate into highly privileged modes and gain control over all the software on a platform.

    Intel® Software Guard Extensions (Intel® SGX) is a name for Intel Architecture extensions designed to increase the security of software through an “inverse sandbox” mechanism. In this approach, rather than attempting to identify and isolate all the malware on the platform, legitimate software can be sealed inside an enclave and protected from attack by the malware, irrespective of the privilege level of the latter. This would complement the ongoing efforts in securing the platform from malware intrusion, similar to how we install safes in our homes to protect valuables even while introducing more sophisticated locking and alarm systems to prevent and catch intruders.

    Getting Started (common to all ISA)


    Tools & Downloads

    • No change to existing content

    Technical Content

    No content found
    Subscribe to Intel Developer Zone Blogs
    No Content Found
    Subscribe to Intel Developer Zone Articles

    Intel® Memory Protection Extensions (Intel® MPX)

    Computer systems face malicious attacks of increasing sophistication, and one of the more commonly observed forms is to cause or exploit buffer overruns (or overflows) in software applications.

    Intel® Memory Protection Extensions (Intel® MPX) is a name for Intel Architecture extensions designed to increase robustness of software. Intel MPX will provide hardware features that can be used in conjunction with compiler changes to check that memory references intended at compile time do not become unsafe at runtime. Two of the most important goals of Intel MPX are to provide this capability at low overhead for newly compiled code, and to provide compatibility mechanisms with legacy software components. Intel MPX will be available in a future Intel® processor.

    No Content Found
    Subscribe to Intel Developer Zone Articles
    No content found
    Subscribe to Intel Developer Zone Blogs

      Intel® Secure Hash Algorithm Extensions (Intel® SHA Extensions)

      The Secure Hash Algorithm (SHA) is one of the most commonly employed cryptographic algorithms.  Primary usages of SHA include data integrity, message authentication, digital signatures, and data de-duplication.  As the pervasive use of security solutions continues to grow, SHA can be seen in more applications now than ever. The Intel® SHA Extensions are designed to improve the performance of these compute intensive algorithms on Intel® architecture-based processors.

      The Intel® SHA Extensions are a family of seven Intel® Streaming SIMD Extensions (Intel® SSE)-based instructions that are used together to accelerate the performance of processing SHA-1 and SHA-256 on Intel architecture-based processors.  Given the growing importance of SHA in our everyday computing devices, the new instructions are designed to provide a needed boost of performance to hashing a single buffer of data. The performance benefits will not only help improve responsiveness and lower power consumption for a given application, they may enable developers to adopt SHA in new applications to protect data while delivering to their user experience goals. The instructions are defined in a way that simplifies their mapping into the algorithm processing flow of most software libraries, thus enabling easier development.

      No Content Found
      Subscribe to Intel Developer Zone Articles
      No content found
      Subscribe to Intel Developer Zone Blogs
        Intel - Version 19 of ISA Extensions manual available
        By Russell Van Zandt1
        Intel, an old version of the ISA Extensions manual is on sticky status here. A newer version, 19, is available now:
        PUSH and POP of XMM/YMM registers
        By srinivasu8
        Hi, I have written a function in that AVX2 instructions are using XMM/YMM registers. Due to use of some of these registers in this function, causing other part of application is crashing. I have observed strange behavior is that If these registers are pushed and popped as like non-volatile general purpose registers are pushed and popped. Please help me whether, we need to push and pop the SIMD registers also. If so all XMM/YMM registers are needs to be saved and how? Up to now I didn't read any thing about saving of the XMM/YMM registres also, but my application is working after these changes only  
        Disable SSE* instructions
        By Hsunwei H.6
        Hello, I am trying to prevent GCC from generating SSE* related instructions. However, SSE uops are still observed using Oprofile. I used the following GCC flags to do so:  -march=i386 -mno-mmx -mno-sse -mno-sse2 -mno-sse3 -mno-ssse3 -mno-sse4.1 -mno-sse4.2 -mfpmath=387 Oprofile outputs:         Event                                          Count                          % time counted         FP_COMP_OPS_EXE:0x1           3,554,989,165,876        26.67         FP_COMP_OPS_EXE:0x2           5,571                           26.67         FP_COMP_OPS_EXE:0x4           0                                 26.67         FP_COMP_OPS_EXE:0x8           18,729,332                   26.67         FP_COMP_OPS_EXE:0x10          0                                 26.67         FP_COMP_OPS_EXE:0x20          0                                 26.67         FP_COMP_OPS_EXE:0x40          0                                 26.67         FP_COMP_OPS_EXE:0x80          0                               ...
        instructional change __m128i
        By lex1
        Hi, good afternoon. I am using a __m128i for store 16 elements of 8 bits __m128i s0 = _mm_set_epi8(pixelsTemp[95], pixelsTemp[94], pixelsTemp[93], pixelsTemp[92], pixelsTemp[91], pixelsTemp[90], pixelsTemp[89], pixelsTemp[88], pixelsTemp[87], pixelsTemp[86], pixelsTemp[85], pixelsTemp[84], pixelsTemp[83], pixelsTemp[82], pixelsTemp[81], pixelsTemp[224]); __m128i s1 = _mm_set_epi8(pixelsTemp[239], pixelsTemp[238], pixelsTemp[237], pixelsTemp[236], pixelsTemp[235], pixelsTemp[234], pixelsTemp[233], pixelsTemp[232], pixelsTemp[231], pixelsTemp[230], pixelsTemp[229], pixelsTemp[228], pixelsTemp[227], pixelsTemp[226], pixelsTemp[80], (char)(175)); after i am adding both variables (s0 and s1) __m128i sum = _mm_add_epi8 (s0, s1); the problem is that when the sum is greater than 255, stored back zero, I know that 8 bits can store up to 255 maximum.  But the question is whether there is any instruction to store the result of the amounts in 16 bits rather than 8 bits. Or if there is a better ...
        Need help: Why my avx code is slower than SSE code?
        By Chen S.8
        The code is compiled using MSVC2010 SP1, with /arch:AVX, and the AVX version is slightly (5~10%) slower than the SSE version. I am using an E-1230 V2 processor with 16GB dual-channel DDR3-1600 memory. Both functions read 416 (9 float point vectors of length 8, and another 4 float vectors of length 8) byte data from memory, and return a float value, there is no memory store involved. The compiled SSE version has 111 instructions, and the AVX version has 67 instructions. All memory visits are aligned (16-byte for SSE, 32-byte for AVX). The difference between the two versions is only that the SSE version process 4 floating points in each instruction, so need two instructions for a length 8 vector, while the AVX version process 8 floating points in each instruction. The AVX version should be at least as fast as the SSE version even if the program is memory-bound, but it turns out the AVX version is slower. The code is the core in an image processing program, the SSE version processes th...
        SDE emulation issue
        By srinivasu4
        I am using the SDE emulator with AVX2 instruction set, I have written some simple program but it is crashing in RELEASE mode with SDE emulator. Please let me know whether SDE emulates the stack related operations or not.  YASM synatxed assembly programming section .txt  global dummy_asm     dummy_asm:     push rbp     mov  rbp, rsp     sub rsp, 1024          push rbx ;no need to push in this program, but in actual program using this register          vmovdqu [rsp], xmm0 ;xmm0 is dummy value     pop rbx     mov rsp, rbp ; restore rsp     pop rbp ; restore previous rbp     ret            
        Broken links for MPX GCC version on the Intel server?
        By c_433
        I'm not sure where else to post issues with downloads so I'm just posting it here. At the following URL I'm trying to download the patched version of GCC. The download, however, stops before it is finished. I have tried to download these files from different physical systems, using e.g. Firefox and wget; just to exclude issues on my side. Is there any more recent static version of GCC with MPX available? I tried to download the source from a different URL (, however this source failed to compile on my particular system.   The files that I have trouble downloading are:   gcc-avx512-mpx-sha-src.tar.bz2 gcc-x86-64-static-avx512-mpx-sha.tar.bz2
        AVX _mm256_store_ps
        By lex11
        Hi I am wanting to run the following code using the AVX instruction set,  I compile without any problem but generates an error when I run: ./vec_avx.x  "Segmentation fault (core dumped)" Reviewing the code the problem is in the instruction:    _mm256_store_ps(&total,acc); //Error Could someone point me to to be. Thank you pd: I compile with the following command:  gcc -O3 vec_avx.c -mavx -o vec_avx.x And the main code is as follows: DATATYPE inner_prod_vec(DATATYPE* a, DATATYPE* b) {   int i;   float *total;   __m256 v1, v2, v3, acc;   acc = _mm256_setzero_ps();  // acc = |0|0|0|0|0|0|0|0|   for (i=0; i<(ARRAY_SZ-8); i+=8){     v1  = _mm256_loadu_ps(a+i);     v2  = _mm256_loadu_ps(b+i);     v3  = _mm256_mul_ps(v1, v2);     acc = _mm256_add_ps(acc, v3);   }   acc = _mm256_hadd_ps(acc,acc);   acc = _mm256_hadd_ps(acc,acc);     _mm256_store_ps(&total,acc); /////////////ERROR///////////////////////      for (; i<ARRAY_SZ; i++)     total += a[i] * b[i];  ...
        Subscribe to Forums