Introduction :
Intel® VTuneTM Amplifier XE for Linux* can analyze most native binaries. However, some settings make analysis easier.
Useful Settings for Intel VTune Amplifier XE for Linux:
|
Switch |
Purpose |
|
-g |
Intel VTune Amplifier XE uses the symbols to associate addresses to source lines. Additionally, this is one of two methods needed to properly walk the call stack in "User mode sampling and tracing analysis" (Hotspots, Concurrency, and Locks and Waits) 1 |
|
"Release" Build |
The time expired to execute a section of code may change if you don't use your normal production switches (Not -O0). Potentially causing you to analyze and attempt optimization on a section of code that is not a performance problem. |
|
-shared-intel |
These switches make it easier for Intel VTune Amplifier XE to run "User mode sampling and tracing analysis" (Hotspots, Concurrency, and Locks and Waits) These settings allows Intel VTune Amplifier XE to differentiate libm and C runtime calls from your code via the Call Stack Mode. |
|
-debug inline-debug-info |
This switch enables the Intel Compiler for Linux to associate the symbols for inlined functions to the inlined function vs the caller. This mode is the default for GCC 4.1 and higher See –fno-inline below for more info |
Useful Setting for applications using Intel® Threading Building Blocks:
|
Switch |
Purpose |
|
-D |
Defining this enables full support of Intel TBB for Intel VTune Amplifier XE. Note: This macro is automatically set if you compile with -D_DEBUG or -DTBB_USE_DEBUG. Without TBB_USE_THREADING_TOOLS set, Intel VTune Amplifier XE will not properly identify concurrency issues related to using TBB constructs. |
Useful Settings for OpenMP* Applications compiled with the Intel® Compiler for Intel VTune Amplifier XE:
|
Switch |
Purpose |
|
-openmp |
Without this switch Intel VTune Amplifier XE will not identify parallel regions due to OpenMP pragmas. |
|
-openmp-link dynamic2 |
This default setting on the Intel Compiler chooses the dynamic version of the OpenMP runtime libraries which has been instrumented for Intel VTune Amplifier XE. |
Settings not recommended for use with Intel VTune Amplifier XE:
|
Switch |
Purpose |
|
"Debug" Build i.e: -O0 |
Note: Using any switch which changes the performance of your application compared to a "Release" build may dramatically impact the profile that Intel VTune Amplifier XE reports - Potentially causing you to analyze and attempt optimization on a section of code that is not a performance problem when compiled as your "Released" binary. |
|
-tcheck |
This setting is an alternative method of instrumentation for Intel® Thread Checker, it will cause overhead altering the performance analysis. Intel VTune Amplifier XE does not use this switch. |
|
-static |
These switches can prevent Intel VTune Amplifier XE from being able to run "User mode sampling and tracing analysis" (Hotspots, Concurrency, and Locks and Waits)2 Note: On the Intel Compiler when you specify -fast it enables -static. |
|
-static-intel2 |
This default setting on the Intel Compiler causes Call Stack Mode in "User mode sampling and tracing analysis" to not properly distinguish these functions as System Functions. |
|
-openmp-link static2 |
In Intel® Compiler 11.0 and Intel® Composer this setting chooses the static version of the OpenMP runtime libraries. This version of the OpenMP runtime library does not contain the necessary instrumentation for Amplifier XE. |
|
-tprofile |
This setting is an alternative method of instrumentation for Intel® Thread Profiler, it will cause overhead altering the performance analysis. Intel VTune Amplifier XE does not use this switch. |
|
-openmp_stubs |
This setting will prevent OpenMP codes from actually being parallel. |
|
-msse4a, -m3dnow |
Binaries which use instructions not supported by Intel Processors may cause unknown behaviors in Intel VTune Amplifier XE. |
|
-debug [parallel | extended | emit-column | expr-source-pos | semantic-stepping | variable-locations]
|
Intel VTune Amplifier XE works best with -debug full (the default when using -g). Other options including parallel, extended, emit-column, expr-source-pos, semantic-stepping, & variable-locations are not supported by Intel VTune Amplifier XE. See –debug inline-debug-info for more info. |
|
-coarray |
Concurrency and Locks and Waits Analysis will not properly identify locks which preventing scaling in Coarray Fortran. |
|
-fno-inline |
In VTune Amplifier XE Update 5 – a new feature was added that allows viewing inlined functions if the compiler used supports inlined symbols. Requires: If Using older compilers - These switches prevent the compiler from inlining functions - allowing Intel VTune Amplifier XE to associate samples and instrumented APIs to the callee and not the caller - this allows a more complete call stack or to see the source code of samples and Instrumented APIs in functions which are inlined without the switch. Note: Using any one of these these switches may dramatically impact the performance of your program - Potentially causing you to analyze and attempt optimization on a section of code that is not a performance problem. Use these switches as an aid to understand inlining - but beware of using them to determine the hotspot in a released application |
Notes:
1) "User mode sampling and tracing analysis" (Hotspots, Concurrency, and Locks and Waits)needs one of two features on the executable and all shared libs in your application to properly walk the call stack:
a) Symbols: Use-g. Note- this option also allows you to view source code
b) Frame pointers: Use -fno-omit-frame-pointer.
Note: There are other options which may add frame pointers to your binary as a side effect, Examples: -fexceptions (which is the default for C++).or -O0 . To make sure the executable (and shared libs) have this information, use the objdump -h <binary> command. You should see .eh_frame_hdr section there.
2) User mode sampling and tracing analysis (Hotspots, Concurrency, and Locks and Waits) works better with dynamic versions of the following libraries:
- OpenMP Runtime Library as supplied by an Intel Compiler
(libiomp5.so or libguide40.so) - Posix Thread library (
libpthread.so) - C Runtime Library (
libc.so) - C++ Runtime Library (
libstdc++.so) - Intel's Libm library (
libm.so)
User mode sampling and tracing analysis (Hotspots, Concurrency, and Locks and Waits) does not work as well with the static version of the following libraries:
- OpenMP Runtime Library as supplied by an Intel Compiler
(libiomp5.a or libguide4.a) - Posix Thread library (
libpthread.a) - C Runtime Library (libc.a)
- C++ Runtime Library (
libstdc++.a) - Intel's Libm library (
libm.a)
Statically linking in library/functions User mode sampling and tracing analysis uses has the following Issues
•a) The static version of the OpenMP runtime library as supplied by an Intel Compiler does not contain the necessary instrumentation for Concurrency, and Locks and Waits.
•b) Call Stack Mode in "User mode sampling and tracing analysis" will not properly distinguish User Code from System Functions.
•c) "User mode sampling and tracing analysis" (Hotspots, Concurrency, and Locks and Waits)1 will be unable to execute unless various C Runtime functions are exported. There are multiple ways to do this, one way is to use the -u command of the GCC compiler.
-u malloc
-u free
-u realloc
-u getenv
-u setenv
-u __errno_location
If your application creates Posix Threads (Either explicitly or through the static OpenMP library or some other static library) there are some additional functions that you will need to explicitly define:
-u pthread_key_create
-u pthread_key_delete
-u pthread_setspecific
-u pthread_getspecific
-u pthread_spin_init
-u pthread_spin_destroy
-u pthread_spin_lock
-u pthread_spin_trylock
-u pthread_spin_unlock
-u pthread_mutex_init
-u pthread_mutex_destroy
-u pthread_mutex_trylock
-u pthread_mutex_lock
-u pthread_mutex_unlock
-u pthread_cond_init
-u pthread_cond_destroy
-u pthread_cond_signal
-u pthread_cond_wait
-u _pthread_cleanup_push
-u _pthread_cleanup_pop
-u pthread_setcancelstate
-u pthread_self
-u pthread_yield
The easiest way to do this is by creating a file with the above options and passing it to gcc or ld.
Example:
gcc -static mysource.cpp @Cdefs @Pdefs
Where Cdefs is a file with options for the C functions needed above and Pdefs is a file with the options for the POSIX functions needed above
More Information:
This article addressed the most obvious switches that developers would have concerns over. Most switches will work with Intel VTune Amplifier XE for Linux - but not every switch or switch combination is tested (there are a lot of switches!). If you have information regarding other switches, please add a comment to this article. If you have question regarding a particular switch please submit an issue to the Intel VTune Amplifier XE forum.
Versions:
Intel® VTune Amplifier XE 2011 for Linux*
Intel® C++ and Fortran Compiler for Linux 11.x, 12.x
GNU C/C++ Compiler 3.4.6
