Use Intel® Compilers Successfully for 64-Bit Intel® Architecture


Challenge

Use the Intel® Compilers for maximum performance. This item answers common questions about using the Intel compilers, and it also gives troubleshooting and optimization techniques.


Solution

Use the steps and best-known methods outlined here.

What if the compiler rejects the source? If the compile fails, check your source code for unsupported language extensions. For example, if you are compiling a file with a GNU gcc language extension, which the Intel compilers do not support, the compiler issues a syntax error. Similarly, for Fortran, if you are compiling code that violates the Fortran 95, Fortran 90, or Fortran 77 standards or contains language extensions the compiler doesn't recognize, the Intel compiler issues a syntax error.

The best way to solve this type of problem is to rewrite the source code, so that it either conforms to the standards or does not contain unsupported extensions. Note that the Intel compiler may give errors for non-compliant source code, even in cases where it may be accepted by other compilers.

• Does the program run? Once the application is built, the typical next step is to run it with a set of tests. The tests are run to ensure that outputs are correct.

• What if some tests fail? If some tests fail, try compiling the files of the application being tested using /Od (Windows*) or -O0 (Linux*) to turn off the optimizer. If the test still fails using /Od, there likely is a problem in the source code. It is also possible that the compiler is generating incorrect code. Should that happen, please report the problem to Intel.

What optimization should I use? The basic optimization switches of the Intel compilers are as follows:

o -Od or -O0 (no optimizations)

o -O1 (optimize for speed while focusing on code size)

o -O2 (optimize for speed)

o -O3 (optimize for speed and perform aggressive optimizations).

It is recommended that you use -O2 optimizations if possible. The Intel compilers default to use -O2. Using -O3, the compiler performs aggressive optimizations, and the optimizer occasionally generates incorrect code. If you use -O3, run your application tests to ensure that all your tests pass at -O3. Using -O3 alone has no effect on IA-32.

What about advanced optimizations? Interprocedural optimizations improve performance within a file and across a multi-file program. The optimizations performed include function inlining, interprocedural constant propagation, dead-code elimination, and others.

Profile-guided optimizations can be used to improve the performance of a program by passing run-time information back to the compiler. This can be used to improve branch prediction and to make better choices of functions to inline.

It is always a good practice to test your application when using aggressive compiler optimizations like Interprocedural Optimizations and Profile Guided Optimizations. Issues with these optimizations are more difficult to debug than those with standard optimizations.

My Program Runs Successfully with /Od but Fails with /O2.What Should I Do? If your program runs successfully with /Od but fails with /O2, the next step is to determine which files are causing the problem. Problematic files can then be compiled with -Od. A divide-and-conquer strategy would be beneficial here. First, compile half the files (e.g., files with names that star with a-m) with /O2 and the rest with /Od. If the program passes the tests, then the problem is somewhere in the files starting with n-z, and files a-m can be compiled with /O2.

Should I worry about precision? When the optimizer is turned on, there may be a minor loss or gain of precision. For example, on IA-32, a double-precision floating-point value is stored as 80 bits in the x87 FPU registers, and intermediate calculations are carried out to this precision. When a value is stored to memory from the x87 FPU registers, it is rounded to declared precision. If your code is sensitive to slight variations in precision, its behavior may change under optimization. You can either add the -mp, mp1 (in Linux) or /Qprec, /Op(in Windows) switches to enforce IEEE precision, or you can rewrite the code. These switches may have a performance impact on your application.

My Program Runs Slowly - What Should I Do? The software version of the "80/20 rule" is that 80% of program execution is spent in 20% of the code. It is recommended that you obtain a performance analysis tool such as the Intel® VTune™ Performance Analyzer that will show you where exactly your program spends its time. The analysis shows you exactly which lines of your program are taking most of the execution time and provides you with tips for improving your code.


Source

Being Successful with the Intel® Compilers – What You Need to Know

有关编译器优化的更完整信息,请参阅优化通知