Intel® compiler options for SSE generation (SSE2, SSE3, SSSE3, SSE4) and processor-specific optimizations

Submit New Article

Last Modified On :   July 13, 2009 3:35 PM PDT
Rate
 


What are the IA-32 processor targeting options in the 11.x compilers?
There are three main types of processor-specific optimization options:

  1. Processor-specific options of the form /arch:x<code> on Windows* ( -m<code> on Linux* or Mac OS* X) generate specialized code for processors specified by <code>. The resulting executables from these processor-specific options can be run on the specified or later Intel® and compatible, non-Intel® processors that support the instruction set. The executable may incorporate optimizations specific to those processors and use a specific version of the Streaming SIMD Extensions (SSE) instruction set; on older processors without support for the corresponding instruction set, and illegal instruction or similar error may occur.
  2. Processor-specific options of the form /Qx<code> on Windows*( -x<code> on Linux* or Mac OS* X) generate specialized code for processors specified by <code>. The resulting executables from these processor-specific options can only be run on the specified or later Intel® processors, as they incorporate optimizations specific to those processors and use a specific version of the Streaming SIMD Extensions (SSE) instruction set. This switch enables some optimizations not enabled with the correpsonding switchws /arch:x<code> or -m<code>. A run-time check is inserted in the resulting executable that will halt the application if run on an incompatible processor. This is intended to help you quickly find out that the program was not intended for the processor it is running on and potentially avoids an illegal instruction error.
  3. Processor-dispatch options of the form /Qax<code> on Windows* ( -ax<code> on Linux* or Mac OS* X) allows the generation of multiple code paths for Intel® processors. Processor dispatch technology performs a check at execution time to determine which processor the application is running on and use the best code path that is compatible with that processor. Compatible, non-Intel processors will take the default code path. The switches described in 1. and 2. above can be used to modify the default code path.

Where the value for <code> can be:

AVX May generate Intel® AVX, SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions for Intel® processors. Optimizes for a future Intel processor.
SSE4.2 May generate Intel® SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions for Intel® processors. Optimizes for the Intel® Core™ i7 processor family and the Intel® Xeon® 55XX series.
SSE4.1 May generate Intel® SSE4.1,SSSE3, SSE3, SSE2, and SSE instructions for Intel® processors. Optimizes for the 45nm Hi-k next generation Intel® Core™ microarchitecture.
SSSE3 May generate Intel® SSSE3, SSE3, SSE2, and SSE in structions for Intel® processors. Optimizes for Intel® Core™ microarchitecture. -xssse3 is the default for the Intel® 64 compiler on Mac OS* X.
SSE3 May generate Intel® SSE3, SSE2, and SSE instructions. Optimizes for the enhanced Pentium M processor microarchitecture and Intel® Netburst microarchitecture. -xsse3 is the default for the IA-32 compiler on Mac OS* X.
SSE3_ATOM

May generate Intel® SSE3, SSE2, SSE and MOVBE instructions for Intel processors. Optimizes for the Intel® Atom™ processor and Intel® Centrino® Atom™ Processor Technology.

SSE2 May generate Intel® SSE2 and SSE instructions. Optimizes for the Intel® Netburst microarchitecture. /arch:SSE2 is the default on Windows* and -msse2 is the default on Linux*.
IA32 Generates generic IA32 compatible code. Can only be used with the /arch: or -m switches.

Which processor-specific option is best for my processor?

SSE4.2 Intel® Core™ i7 Processors
Intel® Xeon® 55XX series

SSE4.1

Intel® Xeon® 74XX series
Quad-Core Intel® Xeon 54XX, 33XX series
Dual-Core Intel® Xeon 52XX, 31XX series
Intel® Core™ 2 Extreme 9XXX series
Intel® Core™ 2 Quad 9XXX series
Intel® Core™ 2 Duo 8XXX series
Intel® Core™ 2 Duo E7200

SSSE3 Quad-Core Intel® Xeon® 73XX, 53XX, 32XX series
Dual-Core Intel® Xeon® 72XX, 53XX, 51XX, 30XX series
In tel® Core™ 2 Extreme 7XXX, 6XXX series
Intel® Core™ 2 Quad 6XXX series
Intel® Core™ 2 Duo 7XXX (except E7200), 6XXX, 5XXX, 4XXX series
Intel® Core™ 2 Solo 2XXX series
Intel® Pentium® dual-core processor E2XXX, T23XX series
SSE3 Dual-Core Intel® Xeon® 70XX, 71XX, 50XX Series
Dual-Core Intel® Xeon® processor (ULV and LV) 1.66, 2.0, 2.16
Dual-Core Intel® Xeon® 2.8
Intel® Xeon® processors with SSE3 instruction set support
Intel® Core™ Duo
Intel® Core™ Solo
Intel® Pentium® dual-core processor T21XX, T20XX series
Intel® Pentium® processor Extreme Edition
Intel® Pentium® D
Intel® Pentium® 4 processors with SSE3 instruction set support
SSE2(default) Intel® Xeon® processors
Intel® Pentium® 4 processors
Intel® Pentium® M
IA32 Intel® Pentium® III Processor
Intel® Pentium® II Processor
Intel® Pentium® Processor

 


What set of processor-specific optimization options are recommended?
Based on the capabilities of the installed hardware base and the advantages enabled by these options, a suggested option setting for Windows* is /QaxSSE4.1

This option combination will produce binaries with two code paths, using the process-dispatch technology described above. One code path will take full advantage of processors based the 45nm Hi-k next generation Intel® Core™ microarchitecture.

The other code path takes advantage of the capabilities provided by processors based on the Intel Pentium 4 and Xeon processors with SSE2 support and other Intel or compatible non-Intel processors with SSE2 support.

Note: The resulting software will not run on systems which do not include the SSE2 instruction set, which includes the original Intel processors through the Intel Pentium III processor. SSE2 provides significant floating point optimization and reproducibility that is not available without SSE2.

Which processor is targeted by default?

  • On IA-32 systems running Windows* and Linux*, /arch:SSE2 is on by default. The resulting code path should run on the Intel Pentium 4 and Intel Xeon processors with SSE2 support and other later Intel processors or compatible non-Intel processors with SSE2 support.
  • On IA-32 systems running Mac OS* X, -xSSE3 is on by default. The compiler may generate SSE3, SSE2, and SSE instructions and the code is optimized for enhanced Pentium M processor microarchitecture.
  • On Intel 64 systems running Mac OS* X, -xSSSE3 is on by default. The compiler may generate SSSE3, SSE3, SSE2, and SSE instructions and the code is optimized for the Intel® Core™ microarchitecture.


To target older IA-32 systems without support for SSE2 instructions, such as systems based on the Intel® Pentium® III Processor, use the switch /arch:ia32 (Windows*) or -mia32 (Linux*).


For information about other, older processor targeting options and their relation to the recommended options above, see
http://software.intel.com/en-us/articles/ia-32-and-intel64-processor-targeting-overview/

 





This article applies to: Financial Services Industry,   Game Development,   Intel Software Network communities,   Intel SW Partner program,   Intel® Atom™ Software Developer Community,   Pentium,   Tools,   Visual Computing,   Xeon,   Intel® C++ Compiler for Linux* Knowledge Base,   Intel® C++ Compiler for Mac OS X* Knowledge Base,   Intel® C++ Compiler for Windows* Knowledge Base,   Intel® Fortran Compiler for Linux* Knowledge Base,   Intel® Fortran Compiler for Mac OS X* Knowledge Base,   Intel® Parallel Composer Knowledge Base,   Intel® Visual Fortran Compiler for Windows* Knowledge Base