- Can I combine the processor values and target more than one processor?
- What has changed in version 11.0 from previous releases with respect to these processor-targeting options?
- What has changed in version 10.1 from previous releases with respect to these processor-targeting options?
- What has changed in version 10.0 from previous releases with respect to these processor-targeting options?
- What has changed in version 9.1 from previous releases with respect to these processor-targeting options?
- What has changed in version 9.0 from previous releases with respect to these processor-targeting options?
- How can I generate code that will run optimally on any processor from Intel or AMD*?
- Why is there a need for a run-time check of the processor in the /Qx[SSE4.2, SSE4.1, SSSE3, SSE3, SSE2] ( -x[SSE4.2, SSE4.1, SSSE3, SSE3, SSE2] on Linux*), processor-specific options?
- Does Intel test the Intel? compilers on all processor types including non-Intel processors?
- Does Intel offer customer support for Intel Compilers used on non-Intel processors?
- If a user still has an Intel? Pentium? II processors to support, what is Intel's recommendation for using the Intel compilers?
- Where can I find more information on processor-specific optimizations?
Can I combine the processor values and target more than one processor? Yes. Using the auto processor dispatch technology you can combine the options to create a binary which potentially has optimized code paths for more than one Intel processor. For example on IA-32 processors you could potentially use:
/QaxSSE3, SSSE3, SSE4.1 ( -axSSE3, SSSE3, SSE4.1 for Linux) for 11.0 or /QaxPTS ( -axPTS for Linux) for 10.1 or older versions
The resulting binary could potentially create 4 code paths for any particular function, including one default code path for an IA32 processor. The compiler will only generate code paths if there is a performance advantage in doing so. Because of this, it is improbable that for any particular function that you will get 4 code paths. You can also combine processor dispatch with processor targeting options. For example you could potentially use:
/QaxSSSE3 /QxSSE3 for 11.0 /QaxT /QxP for 10.1 or older versions
This would potentially create 2 code paths: A code path which would be optimal for the Intel? Core?2 Duo Processor Family and a code path that would be optimal for the Intel Pentium 4 processor family with SSE3 support. Note: That as a path is created for each specific processor the resulting binary size may grow and effect the resulting performance. Using all possible processor targeting values has a high potential to decrease the performance of your application.
What has changed in version 11.0 from previous releases with respect to these processor-targeting options?
The 11.0 compiler introduced the following new switches:
- /QxHost (-xHost for Linux* or Mac OS* X) generate instructions for the highest instruction set and processor available on the compilation host machine
- /QxSSE4.2 or /QaxSSE4.2 (-xSSE4.2 or -axSSE4.2 for Linux*) for systems with SSE4.2 support
- /QxSSE3-ATOM (-xSSE3-ATOM for Linux) for Intel? Atom? processor and Intel? Centrino? Atom? Processor Technology
In addition 11.0 introduced a new naming schema for the processor targeting switches. Previous /QaxKWNOPTS or /QxKWNOPTS (-axKWNOPTS or -xKWNOPTS on Linux) are now /QaxSSE, SSE2, SSE3, SSSE3, SSE4.1 or /QxSSE, SSE2, SSE3, SSSE3, SSE4.1 (-axSSE, SSE2, SSE3, SSSE3, SSE4.1 or -xSSE, SSE2, SSE3, SSSE3, SSE4.1 on Linux).
The instruction set default behavior has changed in 11.0 on Windows* and Linux:
- The new processor default is /arch:SSE2 (Windows*) or -msse2 (Linux*).
When compiling for the IA-32 architecture, /arch:SSE2 (formerly /QxW) is now the default in 11.0 for Windows, -msse2 (formerly -xW) is the default in 11.0 for Linux. Programs built with /arch:SSE2 (-msse2) in effect require that they be run on a processor that supports at least SSE2 such as Intel? Pentium? 4 or certain AMD* processors.
Note that this may change floating point results very slightly, since SSE instructions will be used instead of x87 instructions and therefore computations will be done in the declared precision rather than sometimes a higher precision.
All Intel? 64 architecture processors support SSE2.
To set the default to generic IA-32 as in 10.1 and earlier compilers, specify /arch:IA32
- In 11.0, the new option /QxHost (Windows) or -xHost (Linux or Mac OS X) has been introduced. This selects a processor option appropriate to the hoist processor. See the compiler documentation for more details.
What has changed in version 10.1 from previous releases with respect to these processor-targeting options? No significant changes in 10.1.
What has changed in version 10.0 from previous releases with respect to these processor-targeting options? The 10.0 compiler introduces S and O processor values. The 10.0 compiler has deprecated processor value B. Use processor value N to optimize binaries for the Pentium? M processor. For Mac OS* X, the Intel Compiler now additionally supports Intel 64 Architecture. On Intel 64 systems running the Mac OS X, -xT is on by Default. For Mac OS* X the Intel Compiler supports -xT and -xS processor values.
What has changed in version 9.1 from previous releases with respect to these processor-targeting options? The 9.1 compiler introduces the T processor value. Beginning with the 9.1 version of the Compilers we released Fortran and C++ compilers for the Mac OS. The Intel Compiler for the Mac OS produces binaries optimized for the Intel Core microarchitecture, similar to the -xP option.
What has changed in version 9.0 from previous releases with respect to these processor-targeting options? The N, B, and P processor values were added to provide better optimization for the Intel Pentium 4 processor, Intel Pentium M processor, and Intel Pentium 4 processor with support for Streaming SIMD Extensions 3 ( SSE3), respectively.
The I (Pentium? Pro processors) and M (Intel Pentium II processors) options were deprecated starting with the 8.0 version of the Compiler. These options have been removed altogether in the 9.0 release.
Additionally, the processor-specific options, /Qx[N, B, P] (-x[N, B, P] on Linux) generate a run-time check to determine that the correct compatible Intel processor is used to prevent potential run-time faults that could otherwise occur with /QxK and /QxW.
Note: those options have changed to use the new naming schema in 11.0. Please see What has changed in version 11.0 from previous releases with respect to these processor-targeting options?.
How can I generate code that will run optimally on any processor from Intel or AMD*? The compiler's default optimizations, /O2 (-O2 on Linux and Mac OS), generate very good code for all IA-32 processors. In addition /Qipo (inter-procedural optimization or IPO, -ipo on Linux and Mac OS), /Qprof_use (profile-guided optimization or PGO, -prof_use), and /O3 (high-level loop/memory optimizations, -O3) can add additional performance for many types of applications. /Qax (or -ax, -x on Linux and Mac OS) options can b e used to generate specialized code for the target Intel processor and excellent performance on generic processors.
- /arch:SSE2 (or -msse2) will optimize and generate SSE2 code that runs on both Intel and AMD architecture.
- /arch:SSE3 (or -msse3) will optimize and generate SSE2 code that runs on both Intel and AMD architecture.
Why is there a need for a run-time check of the processor in the /Qx[SSE4.1, SSSE3, SSE3, SSE2] ( -x[SSE4.1, SSSE3, SSE3, SSE2] on Linux*), processor-specific options? These options generate processor-specific instructions, such as SSE4 Vectorizing Compiler and Media Accelerators, SSSE3, SSE3, or SSE2 which may or may not be supported on other Intel and non-Intel processors. The compilers now provide a safeguard for the user to verify that the processor on which the application is running is indeed the processor that was targeted. A run-time check is inserted in the resulting executable that will halt the application if run on an incompatible processor. Without this run-time check, an application may crash with an illegal instruction fault or silently display unexpected behavior if run on an incompatible processor.
Does Intel test the Intel? Compilers on all processor types including non-Intel processors? We cannot test on all processor and platform combinations, but we do perform extensive testing and benchmarking on many platforms that gives us confidence that our optimizations, like /O2, /O3, /Qipo, profile guided optimizations using /Qprof_use(-O2, -O3, -ipo, -prof_use on Linux and Mac OS), and other processor independent compiler options, work well on all Intel processors and Intel compatible processors. Processor values O, W and K are tested on various Intel and non-Intel processors. The processor-specific compiler options like [SSE4.1, SSSE3, SSE3, SSE2] (or [ S, T, P, N] in 10.x or older) are validated only on those specific Intel processors. These options enable the latest and best optimizations to target Intel's latest and best processors.
Does Intel offer customer support for Intel Compilers used on non-Intel processors? Yes. Intel will accept problem reports and fix issues reported on non_Intel processor-based systems.
If a user still has an Pentium? II processors to support, what is Intel's recommendation for using the Intel compilers? With the 8.x and later compilers, we no longer have a Pentium? II processor specific option. It is our intention to move the compiler ahead and primarily support the latest processor families for processor-specific optimization. As mentioned above, all of the non-processor specific optimization like /O2, /O3, /Qipo, profile guided optimizations using /Qprof_use(-O2, -O3, -ipo, -prof_use on Linux and Mac OS), and other processor independent compiler options generate highly optimized code without taking advantage of Streaming SIMD Extensions ( SSE). Also, using the processor-dispatch options will generate a generic code-path that will work on any IA-32 Intel architecture processor. In the 11.0 and later compilers, you will need to add the switch /arch:IA32 (Windows) or -mia32 (Linux).
Where can I find more information on processor-specific optimizations? Much of the technical details, option usage, and recommendations can be found in the white paper: "Optimizing Applications for the Intel? C++ and Fortran Compilers".
|