英特尔® 开发人员专区:
英特尔指令集架构扩展

英特尔的指令集架构 (ISA) 不断进化,以改善功能、性能和用户体验。 下文介绍已列入计划的 ISA 新扩展,以及为今后世代处理器计划中的增强。 英特尔通过提前公布这些扩展,确保软件生态系统有足够时间创新,并在处理器推出时有全新的和增强的产品进入市场。

概述

工具和下载

  • Intel® C++ Compiler

    所有授权客户可从英特尔® 注册中心下载 Intel® C++ Compiler。 英特尔® 软件开发产品评估版也可免费下载

  • Intel Intrinsics Guide

    Intel Intrinsics Guide(英特尔固有指南)是一种用于英特尔固有指令的交互式参考工具,这些指令是 C 样式函数,提供对许多英特尔指令的访问,包括英特尔® 流式 SIMD 扩展英特尔® 高级矢量扩展等,而无需编写汇编代码。

  • Gcc 编译器
    支持英特尔® AVX、英特尔® AVX2、英特尔® AVX-512 和英特尔® 内存保护扩展指令的 gcc 编译器和 glibc 库可根据 GPL 从 Intel Software Development Emulator(英特尔软件开发模拟器)网页下载。

Intel® Advanced Vector Extensions (Intel® AVX)

各行业对于更高计算性能的需求持续增强。 为了与不断增加的需求和不断变化的使用模式保持一致,我们借助英特尔® 高级矢量扩展指令集延续我们的创新历史。

英特尔® 高级矢量扩展指令集是一个针对英特尔® SIMD 流指令扩展的全新 256 位扩展指令集,专为浮点密集型应用而设计。 英特尔® 高级矢量扩展指令集于 2011 年年初作为英特尔® 微架构(代号 Sandy Bridge)处理器家族的一员而发布,目前应用于笔记本电脑和服务器等平台。 凭借更宽的矢量、全新的可扩展语法及丰富的功能,英特尔高级矢量扩展指令提升了性能。 这增强了对数据和通用应用的管理,例如图像、音视频处理、科研模拟、金融分析和三维建模与分析等。

英特尔® 高级矢量扩展指令集 512

将来,一些新产品将腾越至 512 位 SIMD 支持。 程序可将八个双精度和十六个单精度浮点数字,以及八个 64 位和十六个 32 位整数,一起包入 512 位矢量中。 这使得单一指令可处理的数据元素数达到英特尔 AVX/AVX2 的两倍,而且能力为英特尔 SSE 的四倍。

英特尔 AVX-512 指令之重要乃是因为其为最高要求的计算任务开拓了更高的性能。 英特尔 AVX-512 指令通过在指令能力设计中包括前所未有的丰富度而提供最高程度的编译器支持。

英特尔 AVX-512 的特色包括 32 个矢量寄存器(每个宽度为 512 位)和八个专用屏蔽寄存器。 英特尔 AVX-512 是一个灵活的指令集,包括对以下各项的支持:广播、嵌入式屏蔽以启用预测、嵌入式浮点四舍五入控制、嵌入式浮点抑错、发散指令、高速数学指令及大位移数值的简约表达。

英特尔 AVX-512 提供与英特尔 AVX 的一定程度的兼容性,它比先前向 SIMD 操作新宽度的过渡更为强大。 英特尔 SSE 和英特尔 AVX 指令混用必定会影响性能,与此不同,英特尔 AVX 与英特尔 AVX-512 指令可混用而不会影响性能。 英特尔 AVX 寄存器 YMM0–YMM15 映射至英特尔 AVX-512 寄存器 ZMM0–ZMM15(以 x86-64 模式),如同英特尔 SSE 寄存器映射至英特尔 AVX 寄存器。 因此,在受英特尔 AVX-512 支持的处理器中,英特尔 AVX 和英特尔 AVX2 指令在前 16 个 ZMM 寄存器的低 128 或 256 位中运行。

有关英特尔 AVX-512 指令的更多详情,请参阅博客《AVX-512 Instructions》(AVX-512 指令)。 这些指令记载于《Intel® Architecture Instruction Set Extensions Programming Reference》(英特尔® 架构指令集扩展编程参考)中(参见本页面的“概述”选项卡)。

An Embree-Based Viewport Plugin for Autodesk Maya* 2014 with Support for the Intel® Xeon Phi™ Coprocessor
作者:Charles Congdon (Intel)张贴日期:02/02/20150
Download PDF Purpose This code recipe describes how to obtain, build, and use the Embree-based Viewport Plugin for Autodesk Maya* 2014 on either Microsoft Windows* or Linux*. This plugin (actually a suite of plugins) runs under Autodesk Maya 2014 on the Intel® Xeon® processor (referred to as ‘h...
Quick Linking Intel® MKL BLAS, LAPACK to R
作者:Ying H (Intel)张贴日期:12/17/20140
Overview R is a popular programming language for statistical computing and machine learning. There is one article we published already- Using Intel® Math Kernel Library (Intel MKL) with R to show how to integrate Intel MKL BLAS and LAPACK libraries within R to improve the math computing performa...
诊断信息 15532: 循环无法进行矢量化处理:编译时间不足妨碍了循环进行优化
作者:tianhui s.张贴日期:12/04/20140
产品版本: Intel(R) Visual Fortran 编译器 XE 15.0.0.070 原因: 使用 Visual Fortran 编译器的优化选项 ( -O2  -Qopt-report:2 )  时出现矢量化报告,表示编译时间不足妨碍了优化。 示例:   下面的示例将在优化报告中生成以下注释: subroutine foo(a, n) implicit none integer, intent(in) :: n double precision, intent(inout) :: a(n) inte...
Diagnostic 15542: Loop was not vectorized: inner loop was already vectorized.
作者:Devorah H. (Intel)张贴日期:10/30/20140
Product Version: Intel(R) Visual Fortran Compiler XE 15.0.0.070 Cause: The vectorization report generated when using Visual Fortran Compiler's optimization options ( -O2  -Qopt-report:2 ) states that loop was not vectorized since the inner loop was vectorized. Example: An example below will g...
订阅 英特尔开发人员专区文章
Intel感知计算(1)- 简介
作者:yanqing-wang (Intel) 张贴日期:2012/12/09 1
Intel感知计算(1)- 简介 Intel感知计算通过设备感知和理解用户行为来进行人机交互,它是更加自然的、身临其境的、直觉交互方式。现在Intel感知计算SDK Beta版本已经能够使用,网友可以访问http://software.intel.com/en-us/vcsource/tools/perceptual-computing-sdk 来下载安装文件,如图1所示: 图1 先在图1右侧下来框中选择Perceptual Computing,然后在点击Download按钮。   Intel感知计算支持多种使用模式,比如说: l  语言认知              图2 l  ...
AVX指令集中的32种浮点比较关系详解
作者:zyl910 张贴日期:2012/05/09 0
  在传统印象中,数字的比较关系只有6种。但在AVX指令集中,Intel一下给出了32种浮点比较谓词,详见下图-- (Intel手册:Table 3-9. Comparison Predicate for VCMPPD and VCMPPS Instructions)   为什么会有这么多种比较谓词呢?我为此困惑困惑了很久。   直到最近翻阅了不少资料后,才终于将它们弄懂了。一、浮点数据类型   Intel使用的是IEEE 754规范的浮点数据类型。对于浮点数据类型来说,除了可以存储数字、无穷之外,还可以存储 NaN(not a number。非数)。   NaN(非数)分为两大类-- ...
读针对AVX优化代码 --- 优化issue port的使用
作者:chi-gan (Intel) 张贴日期:2010/01/26 1
本文中的issue port专指CPU内部向其他执行单元(ALU, SSE MUL, DIV, Load, STD…)发送指令的通道,在Intel Micro Architectur(Sandy Bridge)中共有6个ports,port2,3,4负责存储单元,port0,1,5负责计算单元。若干计算单元会共享一个port,由于每个issue port只能同时向一个单元发送指令,故有时它们成为瓶颈,特别是port0,1,5。选择合适的指令来避免,如你的代码中大量用到shuffles指令,它们只能通过port5被发送,所以你要用vmovsldup ymm2, [mem]代替vmovsld...
读AVX编程参考的一点所得
作者:chi-gan (Intel) 张贴日期:2009/12/29 5
AVX(Advanced Vector Extensions)是下一代Intel CPU中一个重要的新技术,抽空看了一点,记一些笔记。  增加了256-bit的SIMD寄存器,YMM0~YMM15, 其中低128-bit即 以前的XMM。  新增了FMA(fused-multiply-add)等指令以加强浮点运算能力, 如VFMADD132PD ymm0, ymm1, ymm2/256将ymm0和ymm2/mem中双精度浮点数相乘和ymm1相加并存入ymm0中。两步并一步。该指令有相应的Intrinsic VFMADD132PD_m256d_mm234_fmadd_pd(_m256d a...
订阅 英特尔® 开发人员专区博客

    Intel® Software Guard Extensions (Intel® SGX)

    英特尔愿景声明

    今日的计算工作负荷越来越复杂,由散布在全球的不同团队提交数百款软件模块。业界不断努力在开放平台上划分工作负荷,最初是保护性模式架构将操作系统与应用程序以不同特权级别分隔开来。然而,近年恶意软件的攻击显示其有能力穿透高度特权的模式,进而控制平台上的所有软件。

    软件防护扩展是一种旨在通过逆向沙箱(inverse sandbox)机制提高软件安全性的英特尔架构扩展的名称。这种方式并不试图识别并隔离平台上的所有恶意软件,而是将合法软件封闭在一个地点,保护其不受恶意软件攻击,不论恶意软件有何种特权级别。在保护平台免受恶意软件入侵的持续努力中,这一途径将助以一臂之力,如同家中即使安装了防止及抓获入侵者的高级锁具和警报系统,仍要添置保险箱保护贵重物品。

    入门(适用于所有 ISA)

    概述

    工具和下载

    • 现有内容无变更

    技术内容

    未找到内容
    订阅 英特尔® 开发人员专区博客
    How to use XDB to do kernel debug on Yocto with Minnowboard MAX
    作者:ALICE H. (Intel)张贴日期:01/12/20151
    Introduction Minnowboard MAX is an open hardware which is utilized Intel Atom processor. This hardware is a small and low cost but offer exceptional performance, flexibility, openness and standards. We can prepare micro sd card or usb flash device to expand the hardware storage and easy exchange...
    Innovative Technology for CPU Based Attestation and Sealing
    作者:admin张贴日期:08/14/20130
    Download white paper as PDF By:Ittai Anati, Shay Gueron, Simon P Johnson, Vincent R Scarlata Intel Corporation Abstract Intel is developing the Intel® Software Guard Extensions (Intel® SGX) technology, an extension to Intel® Architecture for generating protected software containers. The container...
    Using Innovative Instructions to Create Trustworthy Software Solutions
    作者:admin张贴日期:08/14/20130
    Download white paper as PDF By:Matthew Hoekstra, Reshma Lal, Pradeep Pappachan, Carlos Rozas, Vinay Phegade, Juan del Cuvillo Intel Corporation Abstract Software developers face a number of challenges when creating applications that attempt to keep important data confidential. Even diligent use o...
    Innovative Instructions and Software Model for Isolated Execution
    作者:admin张贴日期:08/14/20130
    Download white paper as PDF By:Frank McKeen, Ilya Alexandrovich, Alex Berenzon, Carlos Rozas, Hisham Shafi, Vedvyas Shanbhogue and Uday SavagaonkarIntel Corporation Abstract For years the PC community has struggled to provide secure solutions on open platforms. Intel has developed innovative new ...
    订阅 英特尔开发人员专区文章

    Intel® Memory Protection Extensions (Intel® MPX)

    计算机系统正面临越来越复杂的的恶意攻击,其中一个较常见的形式是造成应用软件的缓冲区超越(即溢出)。

    英特尔® 内存保护扩展是旨在增强软件牢固性的英特尔架构扩展之名称。 英特尔内存保护扩展提供的硬件功能可与编译器更改合用,确保在编译时的内存参照不致在运行时成为不安全。 英特尔内存保护扩展的两个最重要目的是:以低开销为新编译的代码提供此种能力,以及提供与现有软件组件的兼容性机制。 英特尔内存保护扩展将在未来的英特尔® 处理器中实现。

    Pointer Checker in ICC: requires dynamic linking of runtime libraries
    作者:Kittur Ganesh (Intel)张贴日期:07/10/20140
    The -check-pointers switch, which enables the Pointer Checker feature, cannot be used with the -static flag on Linux* (/MT on Windows*) which forces all Intel libraries to be linked statically. The reason is that, by design, the Pointer Checker library “libchkp.so” must be shared by all executabl...
    Using Intel® SDE's chip-check feature
    作者:Mark Charney (Intel)张贴日期:10/03/20130
    Intel® SDE includes a software validation mechanism to restrict executed instructions to a particular microprocessor. This is intended to be a helpful diagnostic tool for use when deploying new software. Use chip check when you want to make sure that your program is not using instruction features...
    Using Intel® MPX with the Intel® Software Development Emulator
    作者:Ady Tal (Intel)张贴日期:07/23/20131
    Intel has announced a new technology called Intel® Memory Protection Extensions (Intel® MPX). To find out more, check out the Instruction Set Extensions web pages.  Once you know about Intel MPX, you may want to experiment with Intel® SDE. This article explains how to run Intel MPX with Intel SDE...
    Linux* ABI
    作者:Milind Girkar (Intel)张贴日期:07/18/20130
    by Milind Girkar, Hongjiu Lu, David Kreitzer, and Vyacheslav Zakharin (Intel) Description of the Intel® AVX, Intel® AVX2, Intel® AVX-512 and Intel® MPX extensions required for the Intel® 64 architecture application binary interface.
    订阅 英特尔开发人员专区文章
    未找到内容
    订阅 英特尔® 开发人员专区博客

      Intel® Secure Hash Algorithm Extensions (Intel® SHA Extensions)

      安全哈希算法是最常用的加密算法之一。  安全哈希算法的主要用途包括数据完整性、消息验证、数字签名和重复数据删除。  随着安全解决方案日益广泛地使用,现在安全哈希算法在应用程序中的使用比以往任何时候都更普遍。 英特尔® 安全哈希算法扩展旨在提高在基于英特尔® 架构的处理器上的密集型计算算法的性能。

      英特尔® 安全哈希算法扩展是基于一组七个英特尔® 流式 SIMD 扩展的指令,这些指令共同使用,可在基于英特尔架构的处理器上加速 SHA-1 和 SHA-256 的处理性能。  鉴于安全哈希算法在日常计算设备上的重要性日益提高,设计了新指令以壮大单一数据缓冲区的哈希性能。 性能优势不但有助于提高给定应用程序的响应性、降低能耗,还使开发人员能在新应用程序中实现用户体验目标的同时,采用安全哈希算法来保护数据。 这些指令的定义方式在于简化与大多数软件库的算法处理流的映射,从而便于开发。

      Innovative Technology for CPU Based Attestation and Sealing
      作者:admin张贴日期:08/14/20130
      Download white paper as PDF By:Ittai Anati, Shay Gueron, Simon P Johnson, Vincent R Scarlata Intel Corporation Abstract Intel is developing the Intel® Software Guard Extensions (Intel® SGX) technology, an extension to Intel® Architecture for generating protected software containers. The container...
      Using Innovative Instructions to Create Trustworthy Software Solutions
      作者:admin张贴日期:08/14/20130
      Download white paper as PDF By:Matthew Hoekstra, Reshma Lal, Pradeep Pappachan, Carlos Rozas, Vinay Phegade, Juan del Cuvillo Intel Corporation Abstract Software developers face a number of challenges when creating applications that attempt to keep important data confidential. Even diligent use o...
      Innovative Instructions and Software Model for Isolated Execution
      作者:admin张贴日期:08/14/20130
      Download white paper as PDF By:Frank McKeen, Ilya Alexandrovich, Alex Berenzon, Carlos Rozas, Hisham Shafi, Vedvyas Shanbhogue and Uday SavagaonkarIntel Corporation Abstract For years the PC community has struggled to provide secure solutions on open platforms. Intel has developed innovative new ...
      Intel® SHA Extensions Implementations
      作者:admin张贴日期:07/18/20130
      The Intel® Secure Hash Algorithm (SHA) Extensions are designed to improve the performance of SHA-1 and SHA-256 on Intel® Architecture (IA) processors. This code download provides optimized assembly and intrinsic routines using the Intel® SHA Extensions. A sample test application using published k...
      订阅 英特尔开发人员专区文章
      未找到内容
      订阅 英特尔® 开发人员专区博客
        Updated Intel® Software Development Emulator
        作者:Mark Charney (Intel)7
        Hello, On October 2, 2014, we released version 7.8 of the Intel® Software Development Emulator. It is available here: http://www.intel.com/software/sde   See the release notes for a full list of changes.   This release includes:   Support for AVX512 VBMI and AVX512 IFMA instructions Better support for running on Haswell hosts Updated CPUID information For more information on the new instructions see http://www.intel.com/software/isa  
        Resources about Intel® Transactional Synchronization Extensions (Intel TSX)
        作者:Roman Dementiev (Intel)4
        Hi, you might find this collection of technical material about Intel TSX instructions useful: http://www.intel.com/software/tsx By a suggestion from some senior forum contributors I am making this post sticky. Best regards, Roman
        Links to instruction documentation
        作者:Thomas Willhalm (Intel)25
        Intel Instruction Set Architecture Extensions  Intel® Architecture Instruction Set Extensions Programming Reference includes: Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions (AVX512F, AVX512DQ, AVX512BW, AVX512VL, AVX512CD, AVX512PF, AVX512ER) Intel® Secure Hash Algorithm (Intel® SHA) extensions Intel® Memory Protection Extensions (Intel® MPX) The Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2A and 2B (available here) are the instruction set reference. Haswell (2013) new instructionsare in theprogrammer's reference manual. In appendix C of the Intel 64 and IA-32 Architectures Optimization Reference Manual (available here), the latencies and throughput of instructions are listed. The documentation of the Intel C++ Compiler contains documentation of the intrinsics. The AVX Programming Reference and examples for using AVX are available on the AVX community page. (The interactive Intel Intrinsics Guide is also available there, which is usef...
        TSX example code doesn't work
        作者:YangHun P.5
        I have intel xeon cpu E3-1230 v3 machine which has TSX. I just want to test that TSX runs well. From manual, i got this example pseucode void rtm_wrapped_lock(lock) { if (_xbegin() == _XBEGIN_STARTED) { if (lock is free) /* add lock to the read-set */ return; /* Execute transactionally */ _xabort(0xff); /* 0xff means the lock was not free */ } /* come here following the transactional abort */ original_locking_code(lock); } void rtm_wrapped_unlock(lock) { /* If lock is free, assume that the lock was elided */ if (lock is free) _xend(); /* commit */ else original_unlocking_code(lock); }My test code for RTM which is a set of TSX is like this. void main(void) { int i; int sum[20]; int data[20]; pthread_mutex_t mutex; pthread_mutex_init(&mutex,NULL); for(i=0;i<20;i++) { data[i]=i; sum[i]=0; } omp_set_num_threads(4); #pragma omp parallel for private(i) for(i=0;i<2...
        SDE 7.15 for Linux has no 64-bit libs
        作者:andysem3
        The recently released SDE 7.15 for Linux seem to have 32-bit libraries instead of 64-bit in intel64/pin_ext_lib and intel64/xed_ext_lib. Is this an oversight or am I missing something?  
        SSE ucomiss/comiss strange behavior
        作者:Naer J.7
        Hello. When I run this code : #include <cmath> // for NAN c++11 and up #include <iostream> #include <xmmintrin.h> int main(int argc, char ** argv) { float nan_value = NAN; __m128 const a = _mm_load_ss(&nan_value); __m128 const b = _mm_setzero_ps(); std::cout << "gt : " << (nan_value > 0) << std::endl; std::cout << "lt : " << (nan_value < 0) << std::endl; std::cout << "ge : " << (nan_value >= 0) << std::endl; std::cout << "le : " << (nan_value <= 0) << std::endl; std::cout << "eq : " << (nan_value == 0) << std::endl; std::cout << "ne : " << (nan_value != 0) << std::endl << std::endl << std::endl; std::cout << "ugt : " << _mm_ucomigt_ss(a,b) << std::endl; std::cout << "ult : " << _mm_ucomilt_ss(a,b) << std::endl; std::cout << "uge : " << _mm_ucomige_ss(a,b) << ...
        Measuring Core Voltage
        作者:Srinath A.0
        I am using an Atom N2600 processor. The intel software developer's manual says that a p-state can be requested by writing to MSR 0x199 and the locked p-state can be seen in MSR 0x198. The way to compute Core Voltage is given as MSR_PERF_STATUS[47:32] * (float) 1/(2^13). The data that I see in MSR_PERF_STATUS (MSR 0x198) is 62d104306001045. Bits [47:32] is always 1043 irrespective of the value that I set in MSR 0x199. When I use the formula: 0x1043 = 4163. Voltage = 4163/(2^13)=0.5 V, which is a really low voltage for the processor to operate stably at. It would be great if someone can help me in measuring the core voltage. I am using Ubuntu as my OS. Regards Srinath
        why does _mm_mulhrs_epi16() always do biased rounding to positive infinity?
        作者:unclejoe9
        Does anyone know why the pmulhrsw instruction or _mm_mulhrs_epi16(x) := RoundDown((x * y + 16384) / 32768) always rounds towards positive infinity? To me, this is terribly biased for negative numbers, because then a sequence like -0.6, 0.6, -0.6, 0.6, ... won't add up to 0 on average. Is this behavior intentional or unintentional? If it's intentional, what could be the use? Is there an easy way to make it less biased? Lucky for me, I can just change the order of my operations to get a less biased result (my function is a signed geometric mean): __m128i ChooseSign(x, sign) { return _mm_sign_epi16(x, sign) } signsDifferent = _mm_srai_epi16(_mm_xor_si128(a, b), 15) // (a ^ b) >> 15 sign = _mm_andnot_si128(signsDifferent, a) // !signsDifferent & a //result = ChooseSign(sqrt(a * b), sign) * fraction // biased result = ChooseSign(sqrt(a * b) * fraction, sign)
        订阅 论坛