Intel® System Studio: Intel® AVX2 Support in the Intel® C++ Compiler

Applicable Products

Intel® System Studio 2014 (and later)

Intel® C++ Compiler 14.0 (and later)

Background

The compiler support for the Intel® Core™ Processors is of course common to all Intel Architecture (IA). Prior to the launch of the 4th Generation Intel® Core™ Processors (codenamed "Haswell"), the Intel C++ Compiler has been enabled to use and to optimize for:

  • Intel® Advanced Vector Extensions 2 (Intel® AVX 2.0)
  • Fused Multiply Add (FMA3)
  • Bit Manipulation new Instructions (BMI)
  • MOVBE instruction (previously only supported by the Intel® Atom™ processor)
  • Intel® Transactional Synchronization Extensions (Intel® TSX) (available in some models)

General information about the 4th Generation Intel® Core™ Processors can be found in Intel's product database (ARK).

Targeting 4th Generation Intel® Core™ Processors

To get started, source the Intel Compiler into your shell environment. The example below shows this for a 64-bit Linux host installation using the common Bash shell (.sh). Note that the actual path to compilervars.sh may vary with your installation and version of Intel System Studio. For cross-compiling for a different operating system ("-platform" option), please refer to the reference section of this article.

source /opt/intel/system_studio_2014/bin/compilervars.sh ia32_intel64

The Intel Compiler offers multiple choices to get an application optimized for the 4th Generation Intel Core Processors.

  1. Create a binary with a single code path. This is usually sufficient if the target hardware is known and fixed.
    1. Determined by the host system where the compiler is invoked
    2. Determined by an option that selects the code path explicitly
  2. Create a binary with multiple code paths. Choose a baseline ("fallback") along with one (or more) additional code path(s).

Below are some examples of invoking the Intel C++ Compiler (icpc) for the above cases on a Linux host system. Note, that the C compiler (icc) is often used to also compile C++ code. This works just fine since the Intel Compiler in general selects the frontend according to the file extension (if not specified differently). For the matter of simplicity, all examples below combine the compile-link steps into one stage.

icpc -xHost example-1a.cpp
icc -march=native example-1a.c
icc -march=core-avx2 example-1b.c


icpc -mavx example-1b.cpp

icc -xCORE-AVX2 example-1b.c

( the name of the source code file is associating the cases 1a or 1b as enumerated above )

The first line invokes the Intel C++ Compiler with an option that specifies the code generation (-xHost) to match the processor architecture of the host that is invoking the compiler. The second line is similar but relies on switches that are known from the GNU* GCC C++ Compiler (g++). Please note, that no code specific to the 4th Generation Intel Core Processors will be generated if the host system (that invokes the compiler) is not using such a processor. The third compiler invocation generates an executable that contains a single code path optimized for the 4th Generation Intel Core Processors (-mslm). The last invocation also generates a single code, however the optimizations are specific to Intel processors (implies an Intel CPUID check, and program termination with a runtime message if running on an unsupported processor).

icc -msse3 -axATOM_SSE4.2,AVX,AVX2 example-2.c
icpc -mslm -axAVX2 example-2.cpp
icpc -xATOM_SSE4.2 -axAVX2 example-2.cpp

( the name of the source code file is associating the case 2 as enumerated above )

The first line invokes the Intel C Compiler with an option that specifies the baseline code path (-msse3) and also adds three additional code paths (-axATOM_SSE4.2,AVX,AVX2). Note that the baseline code path must be supported by all processors that are targeted. Therefore the specified baseline (SSE3) also works on the previous generation of Intel Core Processors that supported SSSE3. The additional code paths in the first invocation may only be generated for parts of the application as identified by the heuristics of the Intel Compiler (this can be adjusted a.k.a. "aggressive multiversioning", etc.). The second invocation generates code for the 4th Generation Intel Core Processors thus requirring at least SSE4.2 as well as support for the MOVBE instruction. This baseline (-mslm) is also fine with the 4th generation Intel® Core™ processor family (codenamed "Haswell") due to supporting the MOVBE instruction, however a code path for AVX2 has been added in order to exploit the wider vector registers including FMA3 and other instructions. The last invocation is similar, however the optimizations even in baseline code are now specific for Intel processors (the application will not run on non-Intel processors).

Note that the "x-options" presented above will generate code that is specific to Intel processors; see our optimizations notice.

Other Optimizations

Beside of targeting the code generation for the 4th Generation Intel Core Processors (or other processors), the usual optimization switches may enable the compiler to actually ever generate this kind of code. In particular, SIMD vectorization (enabled by default) mainly orchestrates instruction set extensions such as SSE 4.2 (the MOVBE instruction is a good exception from this "SIMD vectorization rule"). Below example gives an effective set of options to start with:

icpc -O2 -fstrict-aliasing -ipo [...]

As always, one should not apply option switches without knowing what they are good for. There is a lot of content published on the subject of vectorization (not only in this knowledge base). In summary, the usual steps are:

  1. Find the loop candidates that are effective to be auto-vectorized.
  2. Compile the code with a sufficient level of the vectorization report.
  3. Find the previously identified loops in the report.
  4. Adjust the code, use pragmas (ivdep, simd), array notation, intrinsics, etc.
  5. Goto step #2 until the code is vectorized.

To get in more details, please have a look at the related articles (listed below).

Related Articles

Haswell New Instruction Descriptions Now Available!

Intel® System Studio: Intel® Atom™ Processor Support in the Intel® C++ Compiler

Step by Step Performance Optimization with Intel® C++ Compiler

Using Intel® C++ Compiler for Embedded Systems

For more complete information about compiler optimizations, see our Optimization Notice.