Intel® Developer Zone:
Intel Instruction Set Architecture Extensions

Intel’s Instruction Set Architecture (ISA) continues to evolve to improve functionality, performance and the user experience. Featured below are planned extensions to the ISA that are new as well as those being planned for enhancements in future generations of processors. By publishing these extensions early, Intel helps ensure that the software ecosystem has time to innovate and come to market with enhanced and new products when the processors are launched.

Overview

Tools & Downloads

  • Intel® C++ Compiler

    The Intel® C++ Compiler is available for download from the Intel® Registration Center for all licensed customers. Evaluation versions of Intel® Software Development Products are also available for free download.

  • Intel Intrinsics Guide

    The Intel Intrinsics Guide is an interactive reference tool for Intel intrinsic instructions, which are C style functions that provide access to many Intel instructions – including Intel® Streaming SIMD Extensions (Intel® SSE), Intel® Advanced Vector Extensions (Intel® AVX), and more – without the need to write assembly code.

Intel® Advanced Vector Extensions (Intel® AVX)

The need for greater computing performance continues to grow across industry segments. To support rising demand and evolving usage models, we continue our history of innovation with the Intel® Advanced Vector Extensions (Intel® AVX) in products today.

Intel® AVX is a new-256 bit instruction set extension to Intel® SSE and is designed for applications that are Floating Point (FP) intensive. It was released early 2011 as part of the Intel® microarchitecture code name Sandy Bridge processor family and is present in platforms ranging from notebooks to servers. Intel AVX improves performance due to wider vectors, new extensible syntax, and rich functionality. This results in better management of data and general purpose applications like image, audio/video processing, scientific simulations, financial analytics and 3D modeling and analysis.

Intel® Advanced Vector Extensions 512 (Intel® AVX-512)

In the future, some new products will feature a significant leap to 512-bit SIMD support. Programs can pack eight double precision and sixteen single precision floating numbers within the 512-bit vectors, as well as eight 64-bit and sixteen 32-bit integers. This enables processing of twice the number of data elements that IntelAVX/AVX2 can process with a single instruction and four times the capabilities of Intel SSE.

Intel AVX-512 instructions are important because they open up higher performance capabilities for the most demanding computational tasks. Intel AVX-512 instructions offer the highest degree of compiler support by including an unprecedented level of richness in the design of the instruction capabilities.

Intel AVX-512 features include 32 vector registers each 512-bit wide and eight dedicated mask registers. Intel AVX-512 is a flexible instruction set that includes support for broadcast, embedded masking to enable predication, embedded floating point rounding control, embedded floating-point fault suppression, scatter instructions, high speed math instructions, and compact representation of large displacement values.

Intel AVX-512 offers a level of compatibility with Intel AVX which is stronger than prior transitions to new widths for SIMD operations. Unlike Intel SSE and Intel AVX which cannot be mixed without performance penalties, the mixing of Intel AVX and Intel AVX-512 instructions is supported without penalty. Intel AVX registers YMM0–YMM15 map into Intel AVX-512 registers ZMM0–ZMM15 (in x86-64 mode), very much like Intel SSE registers map into Intel AVX registers. Therefore, in processors with Intel AVX-512 support, Intel AVX and Intel AVX2 instructions operate on the lower 128 or 256 bits of the first 16 ZMM registers.

More information about the details about Intel AVX-512 instructions can be found in the blog "AVX-512 Instructions". The instructions are documented in the Intel® Architecture Instruction Set Extensions Programming Reference (see the "Overview" tab on this page).

Samples for Intel® C++ Composer XE
By Jennifer J. (Intel)Posted 02/25/20130
Intel® C++ compiler is an industry-leading C/C++ compiler, including optimization features like auto-vectorization and auto-parallelization, OpenMP*, and Intel® Cilk™ Plus multithreading capabilities; plus the highly optimized performance libraries. We have created a list of articles with samples...
VecAnalysis Python* Script for Annotating Intel C++ & Fortran Compilers Vectorization Reports
By mark-sabahi (Intel)Posted 01/16/20130
  This is the Python* script used to annotate Intel® C++ and Fortran compiler 13.1 (Intel® C++/Fortran/Visual Fortran Composer XE 2013 Update 2 and later) vectorization reports produced at -vec-report7.  The attached zip file contains: vecanalysis.py  vecmessages.py README-vecanalysis.txt NOTE: Y...
Webinar: Get Ready for Intel® Math Kernel Library on Intel® Xeon Phi™ Coprocessors
By Zhang Z (Intel)Posted 12/05/20121
Intel recently unveiled the new Intel® Xeon Phi™ product – a coprocessor based on the Intel® Many Integrated Core architecture. Intel® Math Kernel Library (Intel® MKL) 11.0 introduces high-performance and comprehensive math functionality support for the Intel® Xeon Phi™ coprocessor. You can downl...
Programming for Multicore and Many-core Products including Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors
By James Reinders (Intel)Posted 11/12/20120
Programming for Multicore and Many-core Products including Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors (including language extensions for offloading to Intel® Xeon Phi™ coprocessors) Abstract The programming models in use today, used for multicore processors every day, are available...

Pages

Subscribe to
Intel® Half-Precision Floating-Point Format Conversion Instructions
By Khang Nguyen (Intel)Posted 09/30/20130
Introduction In today’s world, many applications, in one way or another, involve graphics.  High resolution graphical and game applications may require a huge amount of disk space and memory to store graphics data.  Half precision floating format can specifically reduce the amount of graphics dat...
AVX-512 instructions
By James Reinders (Intel)Posted 07/23/201314
Intel® Advanced Vector Extensions 512 (Intel® AVX-512) The latest Intel® Architecture Instruction Set Extensions Programming Reference includes the definition of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions. These instructions represent a significant leap to 512-bit SIMD su...
Free Intel C++ Compilers for Students, and related parallel programming tools.
By James Reinders (Intel)Posted 01/03/20130
I came across this offer - and thought it worth passing along... Students at degree-granting institutions are eligible for free Intel C++ tools (and discounts on Fortran tools.) Linux, Windows and Mac OS versions available. These are serious tools to achieving high performance results with C++ pr...
Parallel Universe Magazine #12: Advanced Vectorization
By Georg Zitzlsberger (Intel)Posted 12/09/20120
This blog contains additional content for the article "Advanced Vectorization" from Parallel Universe #12: You can download the entire magazine here: http://software.intel.com/en-us/intel-parallel-universe-magazine The example used throughout the article can be downloaded as tar-ball in the...

Pages

Subscribe to Intel Developer Zone Blogs

    Intel® Software Guard Extensions (Intel® SGX)

    Intel Vision Statement

    Computing workloads today are increasing in complexity, with hundreds of software modules delivered by different teams scattered across the world. Isolation of workloads on open platforms has been an ongoing effort, beginning with protected mode architecture to create a privilege separation between operating systems and applications. Recent malware attacks however have demonstrated the ability to penetrate into highly privileged modes and gain control over all the software on a platform.

    Intel® Software Guard Extensions (Intel® SGX) is a name for Intel Architecture extensions designed to increase the security of software through an “inverse sandbox” mechanism. In this approach, rather than attempting to identify and isolate all the malware on the platform, legitimate software can be sealed inside an enclave and protected from attack by the malware, irrespective of the privilege level of the latter. This would complement the ongoing efforts in securing the platform from malware intrusion, similar to how we install safes in our homes to protect valuables even while introducing more sophisticated locking and alarm systems to prevent and catch intruders.

    Getting Started (common to all ISA)

    Overview

    Tools & Downloads

    • No change to existing content

    Technical Content

    No content found
    Subscribe to Intel Developer Zone Blogs
    No Content Found

    Pages

    Subscribe to

    Intel® Memory Protection Extensions (Intel® MPX)

    Computer systems face malicious attacks of increasing sophistication, and one of the more commonly observed forms is to cause or exploit buffer overruns (or overflows) in software applications.

    Intel® Memory Protection Extensions (Intel® MPX) is a name for Intel Architecture extensions designed to increase robustness of software. Intel MPX will provide hardware features that can be used in conjunction with compiler changes to check that memory references intended at compile time do not become unsafe at runtime. Two of the most important goals of Intel MPX are to provide this capability at low overhead for newly compiled code, and to provide compatibility mechanisms with legacy software components. Intel MPX will be available in a future Intel® processor.

    No Content Found

    Pages

    Subscribe to
    No content found
    Subscribe to Intel Developer Zone Blogs

      Intel® Secure Hash Algorithm Extensions (Intel® SHA Extensions)

      The Secure Hash Algorithm (SHA) is one of the most commonly employed cryptographic algorithms.  Primary usages of SHA include data integrity, message authentication, digital signatures, and data de-duplication.  As the pervasive use of security solutions continues to grow, SHA can be seen in more applications now than ever. The Intel® SHA Extensions are designed to improve the performance of these compute intensive algorithms on Intel® architecture-based processors.

      The Intel® SHA Extensions are a family of seven Intel® Streaming SIMD Extensions (Intel® SSE)-based instructions that are used together to accelerate the performance of processing SHA-1 and SHA-256 on Intel architecture-based processors.  Given the growing importance of SHA in our everyday computing devices, the new instructions are designed to provide a needed boost of performance to hashing a single buffer of data. The performance benefits will not only help improve responsiveness and lower power consumption for a given application, they may enable developers to adopt SHA in new applications to protect data while delivering to their user experience goals. The instructions are defined in a way that simplifies their mapping into the algorithm processing flow of most software libraries, thus enabling easier development.

      Intel® Software Development Emulator Download
      By Ady Tal (Intel)Posted 12/16/20114
      Intel® Software Development Emulator (released March 06, 2014) DOWNLOAD Intel® SDE for WINDOWS*  (sde-external-6.22.0-2014-03-06-win.tar.bz2) Note:  If you use Cygwin's tar command to unpack the Windows* kit, you must execute a "chmod -R +x" on the unpacked directory. DOWNLOAD Intel® SDE d...

      Pages

      Subscribe to
      No content found
      Subscribe to Intel Developer Zone Blogs
        FMA manipulation of register’s content for XMM, YMM and ZMM register sets
        By Mile M.1
        hello, there wasn’t a typical introduction thread so since it’s my first post i though to introduce myself. my name is mile (yes like the measuring unit) and i’m a student. i’m noob in this area. i’m writing a paper for school and before posting my question(s) here i’ve thoroughly researched for an answer online to the best of my abilities but i didn’t managed to find one. after browsing the forum i’ve decided to post in new topic instead going off topic in another one. during my research some things cleared up to me but i still couldn’t find clearly defined answers. i’ll probably have some trivial questions and silly assumptions so please correct me if i’m wrong or if i’m missing something. i’ve spent a lot of time drawing the diagrams and writing со if someone can help i would appreciate it greatly! i’m having a mentor studies where we don’t have lectures but isnread the professor gives us a topic which we study on our own. unfortunately i can’t bother my professor with this type ...
        gather instructions and the size of indexs for a given base gpr size
        By perfwise5
        Hi,     I have a simple question.  When performing address computations, the size of the BASE and the INDEX are required to be the same.  I presumed this was the case in the GATHER instructions.. but I don't believe it is so now.  Can someone confirm?  Namely.. I'm asking if you can use a 64-bit gpr BASE register, and use 32-bit indexes in an instruction like VGATHERDPS or VPGATHERDD.  In these 2 instructions the indexes are 32-bit values, which I presume are sign extended to 64-bits when you have a 64-bit gpr BASE specified.  I didn't find it clearly stated this was possible nor did I find it was prohibited.. so just wanted to clarify.   Thank you for any helpful and concise feedback perfwise
        ICPC 13.0.2 generates scalar load instead of packed load
        By Paul S.0
        Hi all, I'm a little puzzled about the generated assembly code for this little piece of Cilk code: void gemv(const float* restrict A[4], const float *restrict x, float * restrict y){     __assume_aligned(y, 32);     __assume_aligned(x, 32);     __assume_aligned(A, 32);     y[0:4]  = A[0:4][0] * x[0];     y[0:4] += A[0:4][1] * x[1];     y[0:4] += A[0:4][2] * x[2];     y[0:4] += A[0:4][3] * x[3]; } Looking at the generated assembly code: - The compiler changes the algorithm such that it uses the vdpps instruction (most likely due to the bad access pattern of A).  | - Loads for A are okay (only four packed loads). However, the loads and stores for x and y are quite bad. The compiler issues four scalar loads/ stores for both x and y. More precisely, here is a sequence of the generated scalar loads for x: vmovss    xmm0, DWORD PTR [rsi]                          vmovss    xmm1, DWORD PTR [4+rsi]                        vmovss    xmm2, DWORD PTR [8+rsi]                        vm...
        Will AVX-512 replace the need for dedicated GPU's?
        By Christopher H.13
        I do not expect it to replace high end graphics cards, and will likely be less efficient powerwise than a dedicated gpu (integrated or discrete). As far as I can tell performance wise it will easily make a CPU on par with a mid range GPU, which is far and above what the majority of people need. A 3Ghz 4 Core Skylake cores will have 768GFlops(3Ghz * 4Core * 2x16FMA). The GPU takes up a enough die space to allow for 8 core chips, which would double the max flops. Intel already has the OpenGL and DirectX software renderers from Larrabee. The only thing really lacking is memory bandwidth, although DDR4 and Crystalwell should help with this.
        unaligned loads avx-128 vs. -256
        By Tim Prince8
        I just saw that my cases using _mm256_loadu_ps show better performance than _mm_loadu_ps on corei7-4, where the latter was faster on earlier AVX platforms (in part due to the ability of ICL/icc to compile the SSE intrinsic to AVX-128). Does this mean that advice to consider AVX-128 will soon be of only historical value?  I'm ready to designate my Westmere and corei7 linux boxes as historic vehicles. icc/ICL 14.0.1 apparently corrected the behavior (beginning with introduction of CEAN) where run-time versioning based on vector alignment never took the (AVX-256) vector branch in certain cases where CEAN notation produced effective AVX-128 code.  It seems now that C code can match performance of CEAN, if equivalent pragmas are applied. A key to getting an advantage for AVX-256 on corei7-4 appears to be to try reduced unroll.  In my observation, ICL/icc don't apply automatic unrolling to loops with intrinsics, while gcc does.  When not using intrinsics with ICL, I found the option 'ICL ...
        Latest ASM compiler other than Intel C and C++ Compilers
        By Uday Krishna G.6
        Hi, Am trying to code my application in Assembly to run on x86. Please suggest me the suitable compiler which will support all SSE4.2 Assembly instructions(other than Intel Compiler). If any links which help in execution and procedure will be helpful. 
        Is there some books about SIMD(sse, avx and so on) optimization?
        By zhang h.2
        ~Can someone please recommend a few books on program optimization? I use  multithreading and simd to improve the performance of the program. I always learn simd through the website, and ask questions in the web site. Now I want to buy some books to learn. Is there any books on simd ? Thanks
        Instruction set extensions programming reference, revision 17,
        By Mark Charney (Intel)10
        An updated instruction set extensions programming reference, revision 17, has been posted here.  It includes information about: Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions Intel® Secure Hash Algorithm (Intel® SHA) extensions  Intel® Memory Protection Extensions (Intel® MPX)  For more information about the technologies: http://www.intel.com/software/isa  

        Pages

        Subscribe to Forums