Intel® System Studio: Intel® Atom™ Processor Support in the Intel® C++ Compiler

Applicable Products

Intel® System Studio 2014 (and later)

Intel® C++ Compiler 14.0 (and later)

Background

The compiler support for the Intel Atom Processor is of course common to all Intel Architecture (IA). However, the Intel C++ Compiler has been enabled to optimize for Intel® Streaming SIMD Extensions (SSE 4.2) in the new Intel Atom Processor product families. These product families are layed out as a System on Chip (SoC) and consist of the CPU cores (codenamed "Silvermont") and the system part usually called "uncore". The uncore accounts for a customization towards different application domains (mobile/tablet, micro server, and embedded/communication; hence the SoC is codenamed differently depending on this domain e.g., "Rangeley" for embedded communications (and there were additional codenames used when mentioning the entire platform e.g., "Bay Trail for Intelligent Systems"). More details e.g., about the Intel Atom Processor Product Family for Communications can be found in the embedded section of Intel's product database (ARK).

Targeting Intel Atom Processor Product Family

To get started, source the Intel Compiler into your shell environment. The example below shows this for a Linux host installation using the common Bash shell (.sh). Note that the actual path to compilervars.sh may vary with your installation and version of Intel System Studio. The choice of 32-bit or 64-bit i.e., "ia32" and "intel64" respectively usually only depends on the OS deployment. For cross-compiling for a different target architecture ("ia32_intel64") or operating system ("-platform" option) please refer to the reference section of this article.

source /opt/intel/system_studio_2014/bin/compilervars.sh ia32

The Intel Compiler offers multiple choices to get an application optimized for the Intel Atom Processor product families.

  1. Create a binary with a single code path. This is usually sufficient if the target hardware is known and fixed.
    1. Determined by the host system where the compiler is invoked
    2. Determined by an option that selects the code path explicitly
  2. Create a binary with multiple code paths. Choose a baseline ("fallback") along with one (or more) additional code path(s).

Below are some examples of invoking the Intel C++ Compiler (icpc) for the above cases on a Linux host system. Note, that the C compiler (icc) is often used to also compile C++ code. This works just fine since the Intel Compiler in general selects the frontend according to the file extension (if not specified differently). For the matter of simplicity, all examples below combine the compile-link steps into one stage.

icpc -xHost example-1a.cpp
icc -march=native example-1a.c
icpc -mslm example-1b.cpp
icc -xATOM_SSE4.2 example-1b.c

( the name of the source code file is associating the cases 1a or 1b as enumerated above )

The first line invokes the Intel C++ Compiler with an option that specifies the code generation (-xHost) to match the processor architecture of the host that is invoking the compiler. The second line is similar but relies on switches that are known from the GNU* GCC C++ Compiler (g++). Please note, that no code specific to the new Intel Atom Processor product families will be generated if the host system (that invokes the compiler) is not using such a processor. The third compiler invocation generates an executable that contains a single code path optimized for the new Intel Atom Processor product families (-mslm). The last invocation also generates a single code path, however the optimizations are specific to Intel processors (this implies an Intel CPUID check, and program termination with a runtime message if running on an unsupported processor).

icc -msse3 -axATOM_SSE4.2,AVX,AVX2 example-2.c
icpc -mslm -axAVX2 example-2.cpp
icpc -xATOM_SSE4.2 -axAVX2 example-2.cpp

( the name of the source code file is associating the case 2 as enumerated above )

The first line invokes the Intel C Compiler with an option that specifies the baseline code path (-msse3) and also adds three additional code paths (-axATOM_SSE4.2,AVX,AVX2). Note that the baseline code path must be supported by all processors that are intended to be a target. Here, the specified baseline (SSE3) also works on the previous generation of Intel Atom Processor product families supporting SSSE3. The additional code paths in the first invocation may only be generated for parts of the application as identified by the heuristics of the Intel Compiler (this can be adjusted a.k.a. "aggressive multiversioning", etc.). The second invocation generates code for the new Intel Atom Processor product families thus requirring at least SSE4.2 as well as support for the MOVBE instruction. This baseline (-mslm) is also fine with the 4th generation Intel® Core™ processor family (codenamed "Haswell") due to supporting the MOVBE instruction, however a code path for AVX2 has been added in order to exploit the wider vector registers including FMA3 and other instructions. The last invocation is similar, however the optimizations even in the baseline code (as requested by "x-options" in general) are now specific for Intel processors (this implies an Intel CPUID check, and program termination with a runtime message if running on an unsupported processor).

Other Optimizations

Beside of targeting the code generation for the new Intel Atom Processor product families (or other processors), the usual optimization switches may enable the compiler to actually ever generate this kind of code. In particular, SIMD vectorization (enabled by default) mainly orchestrates instruction set extensions such as SSE 4.2 (the MOVBE instruction is a good exception from this "SIMD vectorization rule"). Below example gives an effective set of options to start with:

icpc -O2 -fstrict-aliasing -ipo [...]

As always, one should not apply option switches without knowing what they are good for. Anyhow, there is a lot of content published on the subject of vectorization (not only in this knowledge base). In summary, the usual steps are:

  1. Find the loop candidates that are effective to be auto-vectorized.
  2. Compile the code with a sufficient level of the vectorization report.
  3. Find the previously identified loops in the report.
  4. Adjust the code, use pragmas (ivdep, simd), array notation, intrinsics, etc.
  5. Goto step #2 until the code is vectorized.

To get in more details, please have a look at the related articles (listed below).

Related Articles

Step by Step Performance Optimization with Intel® C++ Compiler

Using Intel® C++ Compiler for Embedded Systems

For more complete information about compiler optimizations, see our Optimization Notice.