| March 4, 2009 8:00 PM PST | |
Improve application performance in programs that contain many frequently used small to medium-sized functions. This characteristic is very common in object-oriented C++ programs that implement accessor methods.
Enable multi-file inter-procedural optimization (IPO) with the -Qipo option, and unlock parallelism using the -Q parallel option. IPO increases performance by reducing the number of branches within code, removing call overhead through inlining functions, and performing interprocedural memory reference analysis (i.e., keeping critical data in registers across function boundaries).
Developers can leverage the capabilities of Intel's Profile-Guided Optimization (PGO) technology and extend the effectiveness of IPO. Where IPO looks for performance gains by doing a static analysis of application logic, PGO does a dynamic analysis of how an application is used at runtime. The analysis of runtime behavior allows frequently accessed code segments to be moved adjacent to one another, which results in better cache utilization and more efficient processor instruction fetching. PGO also improves branch prediction by generating accurate branch hints for Pentium® 4 processors and Itanium® processors.
With the -Q parallel option, the compiler detects loops that may benefit from multi-threaded execution and generates the appropriate threading calls automatically, maximizing application performance on symmetric multi-processor (SMP) machines or systems that support Hyper-Threading Technology. Applications that execute the same operation on multiple data items will often benefit from using the Single Instruction Multiple Data (SIMD) capabilities of IA-32 processors (such as the Pentium 4 processor).
As with the task of implementing threading support, modifying an application to utilize SIMD instructions can be challenging. The Intel® Compilers take the complexity out of this task with support for automatic vectorization. Using analysis techniques similar to automatic threading, the -Qx option detects operations that can be done in parallel, and then converts the program to process 2, 4, 8, or 16 elements in one operation (depending on the data type) by using SIMD instructions.
Optimize Your Cares Away
For more complete information about compiler optimizations, see our Optimization Notice.

