Compiler Switches for Performance Analysis on Linux* Targets

Intel® VTune™ Amplifier can analyze most native binaries on Linux target systems. However, the settings below are recommended to make the performance analysis more productive and easier:

Use This Switch

To Do This

-g (highly recommended)

Enable generating the symbol information required to associate addresses with source lines and to properly walk the call stack in user-mode sampling and tracing collection types (Hotspots and Threading).

Release build or -O2 (highly recommended)

Enable maximum compiler optimization to focus the VTune Amplifier on real performance problems that cannot be optimized with the compiler.

-shared-intel (Intel® C++ Compiler)

-shared-libgcc (GCC* Compiler)

Enable identifying the libm and C runtime calls as system functions and differentiating them from the user code when a proper filter mode is applied to the VTune Amplifier collection result.

-debug inline-debug-info

(Intel C++ Compiler)

Enable the VTune Amplifier to identify inline functions and, according to the selectedinline mode, associate the symbols for an inline function with the inline function itself or its caller. This is the default mode for GCC* 4.1 and higher.

-D TBB_USE_THREADING_TOOLS

Enable Intel® Threading Building Blocks Analysis (Intel TBB) for the VTune Amplifier. This macro is automatically set if you compile with -D_DEBUG or -DTBB_USE_DEBUG.

Without TBB_USE_THREADING_TOOLS set, the VTune Amplifier will not properly identify concurrency issues related to using Intel TBB constructs.

-qopenmp (highly recommended)

(Intel C++ Compiler)

Enable the VTune Amplifier to identify parallel regions due to OpenMP* pragmas.

-qopenmp-link dynamic

(Intel C++ Compiler)

Enable the Intel Compiler to choose the dynamic version of the OpenMP runtime libraries which has been instrumented for the VTune Amplifier. Usually, this option is enabled for the Intel Compiler by default.

-parallel-source-info=2

(Intel C++ Compiler)

Enable/disable source location emission when OpenMP or auto-parallelism code is generated. 2 is the level of source location emission that tells the compiler to emit path, file, routine name, and line information.

The following compiler settings are NOT recommended:

Do Not Use This Switch

Because Of This

Debug build or -O0

Changes the performance of your application compared to a release build and may dramatically impact the performance profiling potentially causing you to analyze and attempt optimization on a section of code that is not a performance problem in the release build.

-static

-static-libgcc

Prevents the VTune Amplifier from being able to run the user-mode sampling and tracing analysis types. See below for more details.

Note

When you specify the -fast switch with the Intel Compiler, it automatically enables -static.

-static-intel

Prevents the user-mode sampling and tracing analysis types from distinguishing system functions properly. This is the default option for the Intel Compiler.

-qopenmp-link static

Chooses the static version of the OpenMP runtime libraries for the Intel Compiler. This version of the OpenMP runtime library does not contain the instrumentation data required for the VTune Amplifier analysis.

-qopenmp_stubs

Prevents OpenMP code from being parallel.

-msse4a, -m3dnow

Generates binaries that use instructions not supported by Intel processors, which may cause unknown behavior when profiling with the VTune Amplifier.

-debug [parallel | extended | emit-column | expr-source-pos | semantic-stepping | variable-locations]

VTune Amplifier works best with -debug full (the default mode when using -g). Other options including parallel, extended, emit-column, expr-source-pos, semantic-stepping, and variable-locations are not supported by the VTune Amplifier. See -debug inline-debug-info for more information.

-coarray

Prevents the Threading analysis from identifying properly the locks that disable scaling in Coarray Fortran.

Compiling for the User-Mode Sampling and Tracing Analysis

For successful user-mode sampling and tracing analysis (Hotspots and Threading) of your executable and all shared libraries, use the following switches to properly walk through the call stack:

  • Use -g to generate the symbol information and enable the source code analysis.

  • Use -fno-omit-frame-pointer to enable the frame pointers analysis.

    Note

    There are other options that may add frame pointers to your binary as a side effect, for example: -fexceptions (default for C++) or -O0. To make sure the executable (and shared libraries) have this information, use the objdump -h <binary> command and make sure you see the .eh_frame_hdr section there.

User-mode sampling and tracing analysis types work better with dynamic versions of the following libraries:

Library

Dynamic Version (Recommended)

Static Version (Not Recommended)

OpenMP Runtime (supplied by the Intel Compiler)

libiomp5.so or libguide40.so

libiomp5.a or libguide4.a

Posix Thread

libpthread.so

libpthread.a

C Runtime

libc.so

libc.a

C++ Runtime

libstdc++.so

libstdc++.a

Intel Libm

libm.so

libm.a

User-mode sampling and tracing collection has the following limitations for analyzing statically linked libraries/functions:

  • The static version of the OpenMP runtime library supplied by the Intel Compiler does not provide the necessary instrumentation for the Threading analysis type.

  • Call Stack mode cannot properly distinguish user code from system functions.

  • User-mode sampling and tracing collection cannot execute unless various C Runtime functions are exported. There are multiple ways to do this; for example, use the -u command of the GCC compiler:

    • -u malloc

    • -u free

    • -u realloc

    • -u getenv

    • -u setenv

    • -u __errno_location

If your application creates Posix threads (either explicitly or via the static OpenMP library or some other static library), you need to explicitly define the following additional functions:

  • -u pthread_key_create

  • -u pthread_key_delete

  • -u pthread_setspecific

  • -u pthread_getspecific

  • -u pthread_spin_init

  • -u pthread_spin_destroy

  • -u pthread_spin_lock

  • -u pthread_spin_trylock

  • -u pthread_spin_unlock

  • -u pthread_mutex_init

  • -u pthread_mutex_destroy

  • -u pthread_mutex_trylock

  • -u pthread_mutex_lock

  • -u pthread_mutex_unlock

  • -u pthread_cond_init

  • -u pthread_cond_destroy

  • -u pthread_cond_signal

  • -u pthread_cond_wait

  • -u _pthread_cleanup_push

  • -u _pthread_cleanup_pop

  • -u pthread_setcancelstate

  • -u pthread_self

  • -u pthread_yield

The easiest way to do this is by creating a file with the above options and passing it to gcc or ld. For example:

gcc -static mysource.cpp @Cdefs @Pdefs

where Cdefs is a file with options for the required C functions and Pdefs is a file with the options for the required POSIX functions.

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)