User Guide

Contents

View Data on Inline Functions

Configure the
Intel® VTune™
Profiler
data view to display the performance data per inline functions for applications in the Release configuration.

Requirements

This option is supported if you compile your code using:
  • Linux*: GCC* compiler 4.1 (or higher)
  • Linux* and Windows*: Intel® Compiler (12.1.333 or higher), with the
    -debug inline-debug-info
    option (Linux)/
    /debug:inline-debug-info
    option (Windows) enabled

View Inline Functions

To view data on inline functions, in the analysis result window, set the
Inline Mode
filer bar option to
Show inline functions
.
VTune
Profiler
will display inline functions (virtual frames) as regular functions.
To disable displaying inline functions, select
Hide inline functions
.
Example 1: Inline Mode for Hotspots Analysis
In this example, you enable the
Show inline functions
option for the Hotspots analysis. This mode shows a full stack for the
GetModelParams
inline function:
Show inline functions
You can select the
Source Function/Function/Call Stack
level in the
Grouping
menu to view all instances of the inline function in one row.
If you double-click the
GetModelParams
inline function, you can identify the code line that took the most CPU time and analyze the corresponding assembly code:
Example 2: Inline Mode for Hotspots analysis Disabled
When you select the
Hide inline functions
option on the filter bar for the same sample, the
VTune
Profiler
does not show the
GetModelParams
function in the Bottom-up view:
But if you double-click the
main
function entry and explore the source, you can see that all CPU time is attributed to the code line where the
GetModelParams
inline function is called:
Example 3: Inline Mode for GPU Compute/Media Hotspots
By default, the
Inline Mode
for GPU Compute/Media Hotspots analysis is disabled. In this example, 100% of GPU Cycles are attributed to the
GPU_FFT_Global
function:
Double-clicking the
GPU_FFT_Global
source function opens the source view positioned on the code line invoking this function with 95.3% of Estimated GPU Cycles attributed to it:
But if you select the
Computing Task/Function/Call Stack
or
Computing Task/Source Function/Call Stack
grouping level and enable the Inline Mode for this view, you see that the
GPU_FFT_Global
function took only 4.7% of the GPU Cycles, while four inline functions took the rest of cycles:
Double-click the hottest
GPU_FftIteration
function to analyze its source and assembly code:

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804