Intel® AVX: New Frontiers in Performance Improvements and Energy Efficiency

Download PDF

Intel® AVX: New Frontiers in Performance Improvements and Energy Efficiency [PDF 72KB]


As the need for more computing performance continues to grow across industry segments, Intel continues to lead in innovation and the delivery of greater compute capacity to support these growing demands and evolving usage models. Intel has a long history of innovating and in leading the charge in expanding the capabilities of the world’s most popular and broadly used computer architecture – Intel® architecture. Intel continues this legacy of innovation with the introduction of the Intel® Advanced Vector Extensions (Intel® AVX) instructions that drives the industry leading Intel® SSE4 to new levels of performance, flexibility and energy efficiency.

In order to benefit a broad audience of consumer and business customers, Intel will introduce a new set of instructions called Intel AVX supported by a wide range of Intel platforms starting in the 2011 timeframe. Building on the rich legacy of Intel SSE4 and Intel® 64 instruction set architecture (ISA), the Intel AVX provide the infrastructure and building blocks for delivering the performance required by the growing needs of applications such as financial analysis, media content creation, natural resource industry, and HPC computing. Intel AVX will introduce an instruction set extension to enable flexibility in programming environment, efficient coding with vector and scalar data sets, and power-efficient performance across wide, and narrow vector data processing.

This paper will provide a brief background on ISA, and then give an overview of the new instructions and capabilities of the Intel AVX and advantages of these innovative instructions across various applications and programming models.


Instruction Set Architecture and Microarchitecture

To better appreciate the significance of these new instructions, it helps to understand the different architectures used in developing today’s modern microprocessors and their uses.

ISA is the part of an overall computer architecture related to programming. The set includes the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O. The ISA defines a specification of functions (machine commands) implemented by a particular microprocessor design. Within a family of processors, the ISA is often enhanced over time with new instructions to deliver better performance and expose new features while at the same time maintaining compatibility with existing applications.

Microarchitecture refers to the actual design, layout and implementation of an ISA in silicon. It includ es the overall block design, cores, execution units and types (such as floating-point, integer, branch prediction, SIMD, etc), pipeline definitions, cache memory design, and peripheral support. Within a family of processors, the microarchitecture is often enhanced over time to deliver improvements in performance, energy efficiency and capabilities, while maintaining compatibility to ISA.


Leading the Instruction Set Revolution – A Long History in ISA

Intel uses ISA to deliver the superior capabilities of its microarchitecture while maintaining the necessary application-level compatibility across processor generations. Good examples in maintaining instruction set compatibility include the new Intel® Core™2 family of processors. Like the previous generation Intel Core 2 Duo and Intel Pentium D processors, the Intel Core 2 processors implement a nearly identical version of the ISA and provide application-level compatibility. However, internally they have new and improved design. Nearly all applications built for Intel Pentium D processors and will run on Intel Core 2 Duo processor and Intel Core 2 processor without any modification. Even better, nearly all these applications automatically benefit from the superior performance and energy-efficiency of these processors.

As Intel process technology and microarchitecture are continuously evolving at the pace of our new cadence, so are Intel instruction sets. In each evolution:

  • Intel will optimize existing instructions to enable them to receive maximum benefit from the latest microarchitecture improvements and deliver greater performance and power efficiency for existing applications without modifications.
  • Intel will also introduce new sets of instructions designed to optimize the performance and lower the power needs of a broad range of existing and new applications. To maximize the benefit of these new instructions, existing applications should be recompiled with an updated compiler (see Intel® Compilers for more details).


As you can see, in each case, existing software will continue to run correctly as our instruction sets evolve and new ones are added. Equally important, new applications incorporating these instructions – and existing applications recompiled to take advantage of them – will see exciting performance improvements.

Intel’s lead in ISA extends to a broad ecosystem of operating systems, including Microsoft Windows* and Vista*, UNIX*, Linux*, and now the Macintosh* operating systems. Our continuing commitment to extending our ISA for the industry includes:

  • Pioneering architectural consistency to enable software innovations across operating systems, application domains through extended industry ecosystem support
  • Providing a seamless approach for software vendors to address the market dynamics of product opportunities in 32-bit and 64-bit ISA
  • Listening to software developers and independent software vendors (ISVs) in our development of new instructions to help developers succeed more easily with us.
  • Ensuring that existing applications run correctly and perform better.
  • Maintaining correctness as applications use the new instructions.
  • Providing ISA leadership to other archite cture vendors so that the Intel ISA remains cohesive and performs as a standard – simplifying the job of the ISV community.


Developers benefit from processor capabilities in multiple performance vectors: higher throughput of concurrently executing multiple instructions, processing multiple data in one instructions. Intel has long encouraged such coding practices to help increase overall processor throughput and utilization. Early on, Intel began a proactive program to improve application performance on Intel processors by developing special instruction sets. Early examples include the floating-point (FP) instruction set extensions defined in the 8086 chip. More recent examples include Single Instruction, Multiple Data (SIMD) and MMX™ technology. Using MMX technology instruction set, programmers had the ability to execute instructions on multiple data elements loaded into MMX technology registers that would deliver increased performance in media applications such as graphics, gaming, streaming video, and more. In the P6 processor, Intel introduced Intel Streaming SIMD Extensions (Intel SSE). Designed for the Intel® Pentium® III processor, Intel SSE extended MMX technology and allowed SIMD computations to be performed on four packed single-precision FP data elements simultaneously using 128-bit registers. With the Intel NetBurst® microarchitecture, Intel SSE2 expanded SIMD instruction set to a wider spectrum of application domain by offering double-precision FP and 128-bit SIMD integer processing capabilities. Intel SSE2 instructions gave software developers maximum flexibility in implementing algorithms and provided performance enhancements to software such as MPEG-2 video, MP3, 3D graphics, and more.

The launch of the 90 nm process-based Pentium 4 processor brought the Intel SSE3 extensions. Intel SSE3 added 13 additional SIMD instructions that are primarily designed to improve thread synchronization and x87-FP math capabilities. A further advancement, Supplemental Intel SSE3, is now available in the Intel Core microarchitecture. Supplemental Intel SSE3 adds 32 new opcodes-including align and multiply-add-for even greater performance.

The most recent Intel ISA innovation is Intel SSE4 in 2007 - offering a broad collection of instructions for significant performance gains and programming productivity. Intel SSE4 has several compiler vectorization primitives for more efficient media performance, as well as new and innovative string processing instructions. Beginning with the 45 nm Intel microarchitecture based processors, these new instructions have started ramping in a wide range of Intel platforms including desktop, mobile, and server. Intel SSE4 offers dozens of new innovative instructions in two major categories:

  1. Intel SSE4 Vectorizing Compiler and Media Accelerators.
  2. Intel SSE4 Efficient Accelerated String and Text Processing.



Intel ISA innovations continue in 2009 with seven new instructions to accelerate data encryption and decryption. Intel processors based on the Westmere code name will provide six new instructions for symmetric encryption/decryption using the Advance Encryption Standard (AES) and one instruction performing carry-less multiplication (PCMULQDQ) for advanced block cipher encryption. These hardware based primitives provide added security benefit by avoiding table-lookups to protect against software side channel attacks.


Intel® AVX Architecture


Background and Overview


The need for more computing performance continues to grow in across industry segments. To support these growing demands and evolving usage models, Intel continues to lead in innovation and the delivery of greater compute capability:

  • For financial services that need compute-intensive performance to support timely decisions.
  • For resource and manufacturing industries that construct and model software solutions across multiple dimensions of space and time.
  • For service-oriented software innovations targeting personalized or customer-centric experiences that will require new algorithms to distill multiple data sets, correlate historical profiles, transform/decompose feature space representations, and ubiquitous availability of power-efficient compute hardware.


Intel AVX is a new 256-bit SIMD FP vector extension of Intel Architecture. Its introduction is targeted for the Sandy Bridge processor family in the 2010 timeframe. Intel AVX accelerates the trends towards FP intensive computation in general purpose applications like image, video, and audio processing, engineering applications such as 3D modeling and analysis, scientific simulation, and financial analytics.

Intel AVX is a comprehensive ISA extension of the Intel 64 Architecture. The main elements of Intel AVX are:

  • Support for wider vector data (up to 256-bit).
  • Efficient instruction encoding scheme that supports 3 and 4 operand instruction syntax.
  • Flexibility in programming environment, ranging from branch handling to relaxed memory alignment requirements.
  • New data manipulation and arithmetic compute primitives, including Broadcast, permute, fused-multiply-add, etc.


While any application making heavy use of floating-point or integer SIMD can use Intel AVX, the applications that show the best benefit are those that are strongly floating-point compute intensive and can be vectorized. Example applications include audio processing and audio codecs, image and video editing applications, financial services analysis and modeling software, and manufacturing and engineering software.

When Intel’s engineers set out to design Intel AVX several years ago, it was essential that we provided a comprehensive, backwards-compatible solution with built-in extensibility. Our three operand and wider vector syntax is based on an instruction encoding format that apply to the complete set of existing Intel SSE instructions. In addition, Intel AVX can support 4-operand syntax. For example, variable blends become more flexible with 4-operand syntax and have the benefit of preserving the content of the source operands. Intel AVX’s encoding is highly compact – these instructions typically take fewer bytes than the 64-bit forms of current floating-point instructions, yet they have many reserved fields for future features.


Key Benefits of Intel® AVX

Intel AVX is a comprehensive ISA enhancement that adds new functionality in addition to the compact new encoding format.

  • A large number (200+) of legacy Intel SSEx instructions are upgraded by the enhanced instruction encoding to take advantage of features like a distinct source operand and flexible memory alignment.
  • A moderate number (< 100) of legacy 128-bit Intel SSEx instruction have been promoted to process 256-bit vector data.
  • A number of new data processing and arithmetic operations (< 100), not present in legacy Intel SSEx, are added to Intel processors to be launched in 2010 and beyond.


The key advantages of Intel AVX are:

  • Performance:Intel AVX can improve performance on existing and new applications that lend themselves well to largely vectorizable data sets:
    • Wider vector data sets can be processed up to twice the throughput of 128-bit data sets.
    • Application performance can scale up with number of hardware threads and number of cores.
    • Application domain can scale out with advanced platform interconnect fabrics, such as Intel QPI.
  • Power Efficiency: Intel AVX is extremely power efficient. Incremental power is insignificant when the instructions are unused or scarcely used. Combined with the high performance that it can deliver, applications that lend themselves heavily to using Intel AVX can be much more energy efficient and realize a higher performance-per-watt.
  • Extensibility:Intel AVX has powerful built-in extensibility options for the future without resorting to code growth:
    • OS context management rework only needs to be done once.
    • Future Vector Integer support to 256 and 512 bits.
    • Vector Future FP support to 512 bits and even 1024 bits.
  • Compatibility:Intel AVX is backward compatible with previous ISA extensions including Intel SSE4:
    • Simple porting of existing Intel SSE applications to Intel AVX-128.
    • Straightforward porting of existing Intel SSE to Intel AVX-256.
  • Ubiquity: Intel AVX will be available in a wide range of Intel platforms, from sub-notebook to multi-processors servers.
  • Support: Intel‘s comprehensive range of developer tools and an extensive online support presence at Intel® Developer Zone make it easy for developers to start working with Intel AVX today.


Software development platforms will be available in the first half of 2010. Prior to 2010, ISVs will be able to start development using an emulator and other tools that will be available for download from the Intel AVX web site (”.

Intel will be providing various tools, white papers, and a support forum to help ISVs start development. There are multiple paths for development while keeping in mind that with Intel AVX:

  • Most apps written with intrinsics need only recompile.
  • There is a straight forward porting of existing Intel SSE to Intel AVX 256 with Intel libraries, Intel® Integrated Performance Primitives (Intel® IPP), etc.
  • All Intel SSE/2 instructions are extended via simple prefix ( “VEX).



Intel’s leadership and ongoing work in the development of instruction set extensions for Intel architecture provide a continuing path for enhancing the performance, power efficiency and capabilities of a wide range of software. With Intel AVX we are continuing to work with the ISV community to deliver instruction set extensions that truly enhance software products to provide real benefits (everything from improved performance to substantial cost savings) to their customers.


About the Authors

Nadeem Firasta is a Senior Technical Analyst at Intel working for Performance Benchmarking and Competitive Analysis (PBCA) team in the Sales & Marketing Group. His current focus is on evaluating CPU and platform technology landscape across the client computing platforms industry. He is also involved with leading the development of key marketing collaterals for the Intel Sales and Marketing team. Nadeem has been at Intel for 12 years and has held various positions in microprocessor design ranging from lead micro-architect and silicon validation lead on various Intel microprocessors. His email is nadeem.h.firasta at

Mark Buxton is a Principal Engineer at Intel Corporation. He manages the Application Performance team in the Software and Solutions Group. Mark can be reached at Mark.J.Buxton at

Paula Jinbo is a Software Product Marketing Engineer with the Software Solutions Group and is focused on ISV enabling for the mobile platform and for future processor technologies. She has been with Intel for 10 years and has held various positions in marketing, product development and program management. Prior to Intel, she worked for a consumer software vendor managing all distribution and licensing deals for the company’s products in Asia and Latin America. Her email is Paula.jinbo at

Kaveh Nasri is a Product Marketing Engineer with the Digital Enterprise Group and is focused on processor product roadmaps for the desktop platforms and for future processor technologies. He has been with Intel for 20 years and has held various positions in software development, system and software architecture, and engineering management. His email is kaveh.nasri at

Shihjong Kuo is a Senior Engineer in Server Platform Technical Marketing team in the Digital Enterprise Group. His focus is in Intel Architecture extensibility/compatibility across Intel processor generations. He supervises the development of Intel processor manuals for software programming and optimization recommendations. Shihjong has been with Intel for 10 years. His email is shihjong.kuo at




Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

This white paper, as well as the software described in it, is furnished under license and may only be used or copied in accordance with the terms of the license. The information in this document is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document.

Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See for details.

Intel Core and Pentium processor families may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

The code name Sandy Bridge presented in this document is only for use by Intel to identify a product, technology, or service in development, that has not been made commercially available to the public, i.e., announced, launched or shipped. It is not a "commercial" name for products or services and is not intended to function as a trademark.

Copies of documents, which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's Web Site.

Intel, Intel Core, Pentium, and the Intel Logo are trademarks of Intel Corporation in the U.S. and other countries.

*Other names and brands may be claimed as the property of others.

Copyright © 2008, Intel Corporation. All rights reserved.

For more complete information about compiler optimizations, see our Optimization Notice.