Improve Performance on 64-Bit Intel Architecture with Intel C++ Compiler for Linux Options

Submit New Article

January 25, 2009 11:00 PM PST



Challenge

Use the options built into the Intel® C++ Compiler for Linux to improve performance on the Itanium® processor. The Intel compilers have robust feature sets to support optimization at all levels.


Solution

Use the appropriate compiler switches to optimize code, control compilation, and provide reporting. The compiler has four primary types of switches: optimization-level switches, interprocedural and floating-point optimization switches, aliasing switches, and report switches.

  • Optimization Level Switches: Decreasing the number of instructions that are dynamically executed and replacing instructions with faster equivalents are perhaps the two most obvious ways to improve performance. Many traditional compiler optimizations fall into this category: copy and constant propagation, common subexpression elimination, dead-code elimination, peephole optimizations, function inlining, tail-call elimination, etc. ecc provides a rich variety of such optimizations. Three different levels of optimization can be requested - the higher the optimization level, the longer the compile time.

Switch

Description

Default

-O

Same as -O2 on Itanium®-based systems.

OFF

-O0

Same as -O2 on Itanium-based systems.

OFF

-O1

Enables optimizations. Optimizes for speed. For Itanium compiler, -O1 turns off software pipelining to reduce code size.

ON

-O2

Same as -O on Itanium-based systems.

OFF

-O3

Enables -O2 plus more aggressive optimizations that may increase the compilation time. Impact on performance is application-dependent; some applications may not see a performance improvement.

OFF

  • Note that the -O3 switch will prefetch arrays only for unit-stride accesses. For non-unit-stride accesses, significant performance gains are possible through manual prefetching using the lfetch intrinsic.
  • Interprocedural and Floating-Point Optimization Switches Interprocedural optimizations allow the compiler to analyze and transform code across function boundaries, even if the functions are in separate source files. This gives the compiler a more complete view of the program, allowing it to produce better code.

Switch

Description

Default

-ip

Enables interprocedural optimizations for single-file compilation

OFF

-Obn

Controls the compiler's inline expansion. The amount of inline expansion performed varies with the value of n as follows:

  • 0: Disables inlining.
  • 1: Enables (default) inlining of functions declared with the _inline keyword. Also enables inlining according to the C++ language.
  • 2: Enables inlining of any function. However, the compiler decides which functions to inline. Enables interprocedural optimizations and has the same effect as -ip.

OFF

-ipo

Enables interprocedural optimizations across files.

OFF

-IP_fma

Enables the combining of floating-point multiplies and add/subtract operations.

OFF

-IPF_fltacc

Enables optimizations that affect floating-point accuracy

OFF

-IPF_fp_speculation mode

Enables floating-point speculations with the following mode conditions:

  • fast: speculate floating point operations
  • safe: speculate only when safe
  • strict: same as off
  • off: disables speculation of floating-point operations

OFF

-ivdep_parallel

Indicates there is n loop-carried memory dependency in the loop where the IVDEP directive is specified.

OFF

 

  • Aliasing Switches: Two variable expressions are aliases if they refer to the same memory location. In C, aliasing can arise primarily from the use of pointers. Alias detection, in general, is difficult. Without accurate alias information, the compiler must be conservative and not apply optimizations like software pipelining. Thus, the options below can greatly aid the compiler and permit optimizing transformations to be made that would otherwise be impossible.

    If the -restrict switch, is used, a pointer may be declared thusly: int *restrict pi;. The declaration asserts that *pi cannot have an alias within its declared scope.

Switch

Description

Default

-ansi_alias

Directs the compiler to assume the following:

  • Arrays are not accessed out of bounds.
  • Pointers are not cast to non-pointer types, and vice-versa.
  • References to objects of two different scalar types cannot alias. For example, an object of type int cannot alias with an object of type float, or an object of type float cannot alias with an object of type double. If your program satisfies the above conditions, setting the -ansi_alias flag will help the compiler better optimize the program. However, if your program does not satisfy one of the above conditions, the -ansi_alias flag may lead the compiler to generate incorrect code.

OFF

-fno-alias

Assume no aliasing in program.

OFF

-fno-fnalias

Assume no aliasing within functions, but assume aliasing across calls.

OFF

-restrict

Enables pointer disambiguation with the restrict qualifier.

OFF

 

  • Report Switches: ecc can produce detailed optimization reports containing information on average instructions per cycle (IPC), instruction counts, loop transformations (software pipelining, distribution, etc.), strength reductions, predication, if-conversions, function inlining, and register allocation. The level of detail and scope can be spec ified via arguments. These reports can be used to identify code that may need to be hand-optimized (e.g., using pragmas, alias switches, and intrinsics).

Switch

Description

Default

-opt_report

Generates an optimization report directed to stderr, unless -opt_report_file is specified.

OFF

-Opt_report_filefilename

Specifies the filename for the optimization report. It is not necessary to invoke -opt_report when this option is specified.

OFF

-opt_report_levellevel

Specifies the verbosity level of the output. Valid level arguments:

  • min
  • med
  • max



If not specified, min is used by default.

OFF

-opt_report_phasename

Specifies the compilation name for which reports are generated. The option can be used multiple times in the same compilation to get output from multiple phases. Valid name arguments:

  • ipo: Interprocedural Optimizer
  • hlo: High-Level Optimizer
  • ilo: Intermediate Language Scalar Optimizer
  • ecg: Code Generator
  • all: All phases

OFF

-opt_report_routinesubstring

Specifies a routine substring. Reports from all routines with names that include substring as part of the name are generated. By default, reports for all routines are generated.

OFF

-opt_report_help

Displays all possible settings for -opt_report_phase. No compilation is performed.

OFF

 


Source

Directives and Pragmas and Switches Oh My!