英特尔® 开发人员专区:
英特尔指令集架构扩展

英特尔的指令集架构 (ISA) 不断进化,以改善功能、性能和用户体验。 下文介绍已列入计划的 ISA 新扩展,以及为今后世代处理器计划中的增强。 英特尔通过提前公布这些扩展,确保软件生态系统有足够时间创新,并在处理器推出时有全新的和增强的产品进入市场。

概述

工具和下载

  • Intel® C++ Compiler

    所有授权客户可从英特尔® 注册中心下载 Intel® C++ Compiler。 英特尔® 软件开发产品评估版也可免费下载

  • Intel Intrinsics Guide

    Intel Intrinsics Guide(英特尔固有指南)是一种用于英特尔固有指令的交互式参考工具,这些指令是 C 样式函数,提供对许多英特尔指令的访问,包括英特尔® 流式 SIMD 扩展英特尔® 高级矢量扩展等,而无需编写汇编代码。

  • Gcc 编译器
    支持英特尔® AVX、英特尔® AVX2、英特尔® AVX-512 和英特尔® 内存保护扩展指令的 gcc 编译器和 glibc 库可根据 GPL 从 Intel Software Development Emulator(英特尔软件开发模拟器)网页下载。

Intel® Advanced Vector Extensions (Intel® AVX)

各行业对于更高计算性能的需求持续增强。 为了与不断增加的需求和不断变化的使用模式保持一致,我们借助英特尔® 高级矢量扩展指令集延续我们的创新历史。

英特尔® 高级矢量扩展指令集是一个针对英特尔® SIMD 流指令扩展的全新 256 位扩展指令集,专为浮点密集型应用而设计。 英特尔® 高级矢量扩展指令集于 2011 年年初作为英特尔® 微架构(代号 Sandy Bridge)处理器家族的一员而发布,目前应用于笔记本电脑和服务器等平台。 凭借更宽的矢量、全新的可扩展语法及丰富的功能,英特尔高级矢量扩展指令提升了性能。 这增强了对数据和通用应用的管理,例如图像、音视频处理、科研模拟、金融分析和三维建模与分析等。

英特尔® 高级矢量扩展指令集 512

将来,一些新产品将腾越至 512 位 SIMD 支持。 程序可将八个双精度和十六个单精度浮点数字,以及八个 64 位和十六个 32 位整数,一起包入 512 位矢量中。 这使得单一指令可处理的数据元素数达到英特尔 AVX/AVX2 的两倍,而且能力为英特尔 SSE 的四倍。

英特尔 AVX-512 指令之重要乃是因为其为最高要求的计算任务开拓了更高的性能。 英特尔 AVX-512 指令通过在指令能力设计中包括前所未有的丰富度而提供最高程度的编译器支持。

英特尔 AVX-512 的特色包括 32 个矢量寄存器(每个宽度为 512 位)和八个专用屏蔽寄存器。 英特尔 AVX-512 是一个灵活的指令集,包括对以下各项的支持:广播、嵌入式屏蔽以启用预测、嵌入式浮点四舍五入控制、嵌入式浮点抑错、发散指令、高速数学指令及大位移数值的简约表达。

英特尔 AVX-512 提供与英特尔 AVX 的一定程度的兼容性,它比先前向 SIMD 操作新宽度的过渡更为强大。 英特尔 SSE 和英特尔 AVX 指令混用必定会影响性能,与此不同,英特尔 AVX 与英特尔 AVX-512 指令可混用而不会影响性能。 英特尔 AVX 寄存器 YMM0–YMM15 映射至英特尔 AVX-512 寄存器 ZMM0–ZMM15(以 x86-64 模式),如同英特尔 SSE 寄存器映射至英特尔 AVX 寄存器。 因此,在受英特尔 AVX-512 支持的处理器中,英特尔 AVX 和英特尔 AVX2 指令在前 16 个 ZMM 寄存器的低 128 或 256 位中运行。

有关英特尔 AVX-512 指令的更多详情,请参阅博客《AVX-512 Instructions》(AVX-512 指令)。 这些指令记载于《Intel® Architecture Instruction Set Extensions Programming Reference》(英特尔® 架构指令集扩展编程参考)中(参见本页面的“概述”选项卡)。

Using Intel® SDE's chip-check feature
By Mark Charney (Intel)Posted 10/03/20130
Intel® SDE includes a software validation mechanism to restrict executed instructions to a particular microprocessor. This is intended to be a helpful diagnostic tool for use when deploying new software. Use chip check when you want to make sure that your program is not using instruction features...
Webinar -"Intel® System Studio: Embedded application development and debugging tools"
By Naveen Gv (Intel)Posted 09/25/20130
Abstract Presenter Information The Intel® System Studio is a flexible complete software development studio which allows you to optimize Intel® Architecture based intelligent embedded systems and devices. It combines Eclipse* CDT integrated optimizing compiler solutions and signal and me...
How to detect New Instruction support in the 4th generation Intel® Core™ processor family
By Max Locktyukhin (Intel)Posted 08/05/20130
Downloads How to detect New Instruction support in the 4th generation Intel® Core™ processor family [PDF 342.3KB] The 4th generation Intel® Core™ processor family (codenamed Haswell) introduces support for many new instructions that are specifically designed to provide better performance to a bro...
Linux* ABI
By Milind Girkar (Intel)Posted 07/18/20130
by Milind Girkar, Hongjiu Lu, David Kreitzer, and Vyacheslav Zakharin (Intel) Description of the Intel® AVX, Intel® AVX2, Intel® AVX-512 and Intel® MPX extensions required for the Intel® 64 architecture application binary interface.

页面

订阅
使用Intel Compiler 11.1和SDE模拟器进行AVX开发
By xiaochang-wu (Intel)Posted 07/06/20090
Intel® Advanced Vector Extensions (Intel® AVX) 在浮点运算上扩展了Intel® SSE的能力。Intel® AVX使用256位寄存器(SSE为128位)并对指令集进行了扩展。每条指令可以同时处理8个float或4个double数据。   对于浮点处理密集型的应用,非常适合于进行AVX优化。Intel支持AVX的Sandy Bridge平台还未发布,在发布之前我们如何进行AVX开发呢?答案是通过Intel® Compiler(ICL)和Intel® Software Development Emulator(SDE),编译并模拟运行包含...

页面

订阅 英特尔® 开发人员专区博客

    Intel® Software Guard Extensions (Intel® SGX)

    英特尔愿景声明

    今日的计算工作负荷越来越复杂,由散布在全球的不同团队提交数百款软件模块。业界不断努力在开放平台上划分工作负荷,最初是保护性模式架构将操作系统与应用程序以不同特权级别分隔开来。然而,近年恶意软件的攻击显示其有能力穿透高度特权的模式,进而控制平台上的所有软件。

    软件防护扩展是一种旨在通过逆向沙箱(inverse sandbox)机制提高软件安全性的英特尔架构扩展的名称。这种方式并不试图识别并隔离平台上的所有恶意软件,而是将合法软件封闭在一个地点,保护其不受恶意软件攻击,不论恶意软件有何种特权级别。在保护平台免受恶意软件入侵的持续努力中,这一途径将助以一臂之力,如同家中即使安装了防止及抓获入侵者的高级锁具和警报系统,仍要添置保险箱保护贵重物品。

    入门(适用于所有 ISA)

    概述

    工具和下载

    • 现有内容无变更

    技术内容

    未找到内容
    订阅 英特尔® 开发人员专区博客
    未找到内容
    订阅

    Intel® Memory Protection Extensions (Intel® MPX)

    计算机系统正面临越来越复杂的的恶意攻击,其中一个较常见的形式是造成应用软件的缓冲区超越(即溢出)。

    英特尔® 内存保护扩展是旨在增强软件牢固性的英特尔架构扩展之名称。 英特尔内存保护扩展提供的硬件功能可与编译器更改合用,确保在编译时的内存参照不致在运行时成为不安全。 英特尔内存保护扩展的两个最重要目的是:以低开销为新编译的代码提供此种能力,以及提供与现有软件组件的兼容性机制。 英特尔内存保护扩展将在未来的英特尔® 处理器中实现。

    Intel® Intrinsics Guide
    By adminPosted 10/30/201218
    Overview The Intel Intrinsics Guide is an interactive reference tool for Intel intrinsic instructions, which are C style functions that provide access to many Intel instructions – including Intel® Streaming SIMD Extensions (Intel® SSE), Intel® Advanced Vector Extensions (Intel® AVX), and more – w...
    Intel® Software Development Emulator Release Notes
    By Ady Tal (Intel)Posted 06/15/20120
    Release notes for the Intel® Software Development Emulator
    Intel® Software Development Emulator
    By Ady Tal (Intel)Posted 06/15/201216
      What If Home | Product Overview | FAQ | Primary Technology Contacts | Discussion Forum | Blog     Product Overview This emulator is called Intel® Software Development Emulator or Intel® SDE, for short. The current version is 6.22 and was released March 06, 2014. This version correspo...
    Intel® Software Development Emulator Download
    By Ady Tal (Intel)Posted 12/16/20114
    Intel® Software Development Emulator (released March 06, 2014) DOWNLOAD Intel® SDE for WINDOWS*  (sde-external-6.22.0-2014-03-06-win.tar.bz2) Note:  If you use Cygwin's tar command to unpack the Windows* kit, you must execute a "chmod -R +x" on the unpacked directory. DOWNLOAD Intel® SDE d...

    页面

    订阅
    未找到内容
    订阅 英特尔® 开发人员专区博客

      Intel® Secure Hash Algorithm Extensions (Intel® SHA Extensions)

      安全哈希算法是最常用的加密算法之一。  安全哈希算法的主要用途包括数据完整性、消息验证、数字签名和重复数据删除。  随着安全解决方案日益广泛地使用,现在安全哈希算法在应用程序中的使用比以往任何时候都更普遍。 英特尔® 安全哈希算法扩展旨在提高在基于英特尔® 架构的处理器上的密集型计算算法的性能。

      英特尔® 安全哈希算法扩展是基于一组七个英特尔® 流式 SIMD 扩展的指令,这些指令共同使用,可在基于英特尔架构的处理器上加速 SHA-1 和 SHA-256 的处理性能。  鉴于安全哈希算法在日常计算设备上的重要性日益提高,设计了新指令以壮大单一数据缓冲区的哈希性能。 性能优势不但有助于提高给定应用程序的响应性、降低能耗,还使开发人员能在新应用程序中实现用户体验目标的同时,采用安全哈希算法来保护数据。 这些指令的定义方式在于简化与大多数软件库的算法处理流的映射,从而便于开发。

      Intel® SHA Extensions
      By adminPosted 07/17/20130
      Download PDF New Instructions Supporting the Secure Hash Algorithm on Intel® Architecture Processors July 2013 Executive Summary This paper provides an introduction to the family of new instructions that support performance acceleration of the Secure Hash Algorithm (SHA) on Intel® Architecture pr...
      Intel® Intrinsics Guide
      By adminPosted 10/30/201218
      Overview The Intel Intrinsics Guide is an interactive reference tool for Intel intrinsic instructions, which are C style functions that provide access to many Intel instructions – including Intel® Streaming SIMD Extensions (Intel® SSE), Intel® Advanced Vector Extensions (Intel® AVX), and more – w...
      Intel® Software Development Emulator Release Notes
      By Ady Tal (Intel)Posted 06/15/20120
      Release notes for the Intel® Software Development Emulator
      Intel® Software Development Emulator
      By Ady Tal (Intel)Posted 06/15/201216
        What If Home | Product Overview | FAQ | Primary Technology Contacts | Discussion Forum | Blog     Product Overview This emulator is called Intel® Software Development Emulator or Intel® SDE, for short. The current version is 6.22 and was released March 06, 2014. This version correspo...

      页面

      订阅
      未找到内容
      订阅 英特尔® 开发人员专区博客
        FMA Support
        By rmendes.silva3
        Hello guys, sorry for a basic question. I've been looking for architectures which supports FMA. I know Sandy Bridge doesn't support, and Haswel supports it. But, what about Ivy Bridge? Does Ivy Bridge supports FMA? Best regards.
        ippGetCpuFeatures for AVX2 support
        By bronxzv9
        I'm relying at the moment on inline ASM to check for AVX2 support, but use the IPP function ippGetCpuFeatures to check for AVX and SSEx features. Using the IPP function is arguably a better solution (simple & clean) than inline ASM, so I have a comment in my code for the AVX2 checks along the line of "use the IPP stuff instead when available" I'm doing some cleanup these days and I remarked a series of new flags in ippcore.h, but it looks like several of these new flags aren't explained in the latest IPP documentation. It will be great to have full documentation for the ippCPUID_AVX2 flag, particularly to know if it implies FMA, BMI1, BMI2 etc., is this information available  somewhere ?  
        There are something wrong with using svml in inline ASM
        By zhang y.4
             I try using __svml_sin2 in inline ASM like the way compiler does.  A code snippet as following, "vmovupd (%1), %%ymm0\n\t" "call __svml_sin4\n\t" "vmovupd %%ymm0, (%0)\n\t" "sub $1, %%rax\n\t" "jnz 3b\n\t"    The program can build. But, the running output values are wrong.     Then I use GDB to locate the problem. It seems that, the SVMLfunction __svml_sin4 uses the general registers rax,rbx,rcx,rdx and so on,without save the scene. So I want to save the registers modified by SVML myself. The problem is, I do not know exactly which registers are modified. Maybe different SVML function use different registers.     So, anybody knows how to use the svml in inline assembly correctly?      thanks in advance for any answer.
        Loops inside transactional regions in RTM (TSX)
        By jsg2
        Hi everyone, I have a question about loops in TSX. Can I put loops inside a transactional region? Example xbegin(); ...    while(cond) i++; .. xend(); Thank you very much,
        AVX Power consumption (on i5)
        By magicfoot1
        Dear all, Is there any data on how much more power is consumed when using the AVX, specifically on an i5 ? Where can I get some data on the i5 power consumption of power at peak floating point processing without the use of AVX, and the use of AVX.   I would expect it to look like something in the order of 55w without AVX, 60w with AVX. This is a total assumption only and I would appreciate anyone with some quantitative opinions to list here.  
        Different ways to turn an AoS into an SoA
        By Diego Caballero6
        Hi, I'm trying to implement a permutation that turns an AoS (where the structure has 4 float) into a SoA, using SSE, AVX, AVX2 and KNC, and without using gather operations, to find out if it worth it. For example, using KNC, I would like to use 4 zmm registers: {A0, A1, ... A15} {B0, B1, ... B15} {C0, C1, ... C15} {D0, D1, ... D15} to end up having something like: {A0, A4, A8, A12, B0, B4, B8, B12, C0, C4, C8, C12, D0, D4, D8, D12} {A1, A5, A9, ...} {A2, A6, A10, ...} {A3, A7, A11, ...} Since the permutation instructions are significantly changing among architectures and I wouldn't like to reinvent the wheel, I would be glad if someone could point me where to find information about this, or share their knowledge.   Thank you in advance.
        How to clear the upper 128 bits of __m256 value?
        By Vladimir Sedach8
        How can I clear the upper 128 bits of m2: __m256i    m2 = _mm256_set1_epi32(2); __m128i    m1 = _mm_set1_epi32(1); m2 = _mm256_castsi128_si256(_mm256_castsi256_si128(m2)); m2 = _mm256_castsi128_si256(m1); don't work -- Intel’s documentation for the _mm256_castsi128_si256 intrinsic says that “the upper bits of the resulting vector are undefined”. At the same time I can easily do it in assembly: VMOVDQA xmm2, xmm2 VMOVDQA xmm2, xmm1 Of cause I'd not like to use _mm256_insertf128_si256().  
        Get _mm_alignr_epi8 functionality on 256-bit vector registers (AVX2)
        By Diego Caballero15
        Hello, I'm porting an application from SSE to AVX2 and KNC. I have some _mm_alignr_epi8 intrinsics. While I just had to replace this intrinsic by the _mm512_alignr_epi32 intrinsic for KNC (by the way, I missed this intrinsic in http://software.intel.com/sites/landingpage/IntrinsicsGuide/ for KNC), it seems that the 256-bit version, _mm256_alignr_epi8 does something unexpected. It is not an extension of the previous 128-bit instruction to 256 bits. It performs a 2x128-bit alignr on 256-bit vectors, which is not the expected behaviour if we look at its counterparts in AVX512 and KNC. Does someone know the most efficient way of implementing the extension of _mm_alignr_epi8 to 256-bit vectors using AVX2 intrinsics? I.e., being V1={7, 6, 5, 4, 3, 2, 1, 0} and V2={15, 14, 13, 12, 11, 10, 9, 8}, the output of this operation should be V3{8, 7, 6, 5, 4, 3, 2 ,1} and not V3{12, 7, 6, 5, 8, 3, 2 ,1}, which is what I get using _mm256_alignr_epi8. Thank you in advance    

        页面

        订阅 论坛