User Guide

Contents

Compiler Switches for Performance Analysis on Linux* Targets

Intel® VTune™
Profiler
can analyze most native binaries on Linux target systems. However, the settings below are recommended to make the performance analysis more productive and easier:
Use This Switch
To Do This
-g
(highly recommended)
Enable generating the symbol information required to associate addresses with source lines and to properly walk the call stack in user-mode sampling and tracing collection types (Hotspots and Threading).
Release
build or
-O2
(highly recommended)
Enable maximum compiler optimization to focus the
VTune
Profiler
on real performance problems that cannot be optimized with the compiler.
-shared-intel
(Intel® C++ Compiler)
-shared-libgcc
(GCC* Compiler)
Enable identifying the
libm
and C runtime calls as system functions and differentiating them from the user code when a proper filter mode is applied to the
VTune
Profiler
collection result.
-debug inline-debug-info
(Intel C++ Compiler)
Enable the
VTune
Profiler
to identify inline functions and, according to the selectedinline mode, associate the symbols for an inline function with the inline function itself or its caller. This is the default mode for GCC* 4.1 and higher.
-D TBB_USE_THREADING_TOOLS
Enable Intel® Threading Building Blocks Analysis (Intel TBB) for the
VTune
Profiler
. This macro is automatically set if you compile with
-D_DEBUG
or
-DTBB_USE_DEBUG
.
Without
TBB_USE_THREADING_TOOLS
set, the
VTune
Profiler
will not properly identify concurrency issues related to using Intel TBB constructs.
-qopenmp
(highly recommended)
(Intel C++ Compiler)
Enable the
VTune
Profiler
to identify parallel regions due to OpenMP* pragmas.
-qopenmp-link dynamic
(Intel C++ Compiler)
Enable the Intel Compiler to choose the dynamic version of the OpenMP runtime libraries which has been instrumented for the
VTune
Profiler
. Usually, this option is enabled for the Intel Compiler by default.
-parallel-source-info=2
(Intel C++ Compiler)
Enable/disable source location emission when OpenMP or auto-parallelism code is generated.
2
is the level of source location emission that tells the compiler to emit path, file, routine name, and line information.
-gline-tables-only
-fdebug-info-for-profiling
Intel oneAPI DPC++ Compiler (Beta)
Enable generating debug information for GPU analysis of a DPC++ application.
-Xsprofile
Intel oneAPI DPC++ Compiler (Beta)
Enable source-level mapping of performance data for FPGA application analysis.
The following compiler settings are NOT recommended:
Do Not Use This Switch
Because Of This
Debug
build or
-O0
Changes the performance of your application compared to a release build and may dramatically impact the performance profiling potentially causing you to analyze and attempt optimization on a section of code that is not a performance problem in the release build.
-static
-static-libgcc
Prevents the
VTune
Profiler
from being able to run the user-mode sampling and tracing analysis types. See below for more details.
When you specify the
-fast
switch with the Intel Compiler, it automatically enables
-static
.
-static-intel
Prevents the user-mode sampling and tracing analysis types from distinguishing system functions properly. This is the default option for the Intel Compiler.
-qopenmp-link static
Chooses the static version of the OpenMP runtime libraries for the Intel Compiler. This version of the OpenMP runtime library does not contain the instrumentation data required for the
VTune
Profiler
analysis.
-qopenmp_stubs
Prevents OpenMP code from being parallel.
-msse4a
,
-m3dnow
Generates binaries that use instructions not supported by Intel processors, which may cause unknown behavior when profiling with the
VTune
Profiler
.
-debug [parallel | extended | emit-column | expr-source-pos | semantic-stepping | variable-locations]
VTune
Profiler
works best with
-debug full
(the default mode when using
-g
). Other options including
parallel
,
extended
,
emit-column
,
expr-source-pos
,
semantic-stepping
, and
variable-locations
are not supported by the
VTune
Profiler
. See
-debug inline-debug-info
for more information.
-coarray
Prevents the Threading analysis from identifying properly the locks that disable scaling in Coarray Fortran.

Compiling for the User-Mode Sampling and Tracing Analysis

For successful user-mode sampling and tracing analysis (Hotspots and Threading) of your executable and all shared libraries, use the following switches to properly walk through the call stack:
  • Use
    -g
    to generate the symbol information and enable the source code analysis.
  • Use
    -fno-omit-frame-pointer
    to enable the frame pointers analysis.
    There are other options that may add frame pointers to your binary as a side effect, for example:
    -fexceptions
    (default for C++) or
    -O0
    . To make sure the executable (and shared libraries) have this information, use the
    objdump -h <
    binary
    >
    command and make sure you see the
    .eh_frame_hdr
    section there.
User-mode sampling and tracing analysis types work better with dynamic versions of the following libraries:
Library
Dynamic Version (Recommended)
Static Version (Not Recommended)
OpenMP Runtime (supplied by the Intel Compiler)
libiomp5.so
or
libguide40.so
libiomp5.a
or
libguide4.a
Posix Thread
libpthread.so
libpthread.a
C Runtime
libc.so
libc.a
C++ Runtime
libstdc++.so
libstdc++.a
Intel Libm
libm.so
libm.a
User-mode sampling and tracing collection has the following limitations for analyzing statically linked libraries/functions:
  • The static version of the OpenMP runtime library supplied by the Intel Compiler does not provide the necessary instrumentation for the Threading analysis type.
  • Call Stack mode cannot properly distinguish user code from system functions.
  • User-mode sampling and tracing collection cannot execute unless various C Runtime functions are exported. There are multiple ways to do this; for example, use the
    -u
    command of the GCC compiler:
    • -u malloc
    • -u free
    • -u realloc
    • -u getenv
    • -u setenv
    • -u __errno_location
If your application creates Posix threads (either explicitly or via the static OpenMP library or some other static library), you need to explicitly define the following additional functions:
  • -u pthread_key_create
  • -u pthread_key_delete
  • -u pthread_setspecific
  • -u pthread_getspecific
  • -u pthread_spin_init
  • -u pthread_spin_destroy
  • -u pthread_spin_lock
  • -u pthread_spin_trylock
  • -u pthread_spin_unlock
  • -u pthread_mutex_init
  • -u pthread_mutex_destroy
  • -u pthread_mutex_trylock
  • -u pthread_mutex_lock
  • -u pthread_mutex_unlock
  • -u pthread_cond_init
  • -u pthread_cond_destroy
  • -u pthread_cond_signal
  • -u pthread_cond_wait
  • -u _pthread_cleanup_push
  • -u _pthread_cleanup_pop
  • -u pthread_setcancelstate
  • -u pthread_self
  • -u pthread_yield
The easiest way to do this is by creating a file with the above options and passing it to
gcc
or
ld
. For example:
gcc -static mysource.cpp @Cdefs @Pdefs
where
Cdefs
is a file with options for the required C functions and
Pdefs
is a file with the options for the required POSIX functions.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804