Configure the Intel® VTune™ Amplifier data view to display the performance data per inline functions for applications in the Release configuration.
This option is supported if you compile your code using:
Linux*: GCC* compiler 4.1 (or higher)
Linux* and Windows*: Intel® Compiler (12.1.333 or higher), with the -debug inline-debug-info option (Linux)//debug:inline-debug-info option (Windows) enabled
View Inline Functions
To view data on inline functions, in the analysis result window, set the Inline Mode filer bar option to Show inline functions. VTune Amplifier will display inline functions (virtual frames) as regular functions.
To disable displaying inline functions, select Hide inline functions.
Example 1: Inline Mode for Hotspots Analysis
In this example, you enable the Show inline functions option for the Hotspots analysis. This mode shows a full stack for the GetModelParams inline function:
You can select the Source Function/Function/Call Stack level in the Grouping menu to view all instances of the inline function in one row.
If you double-click the GetModelParams inline function, you can identify the code line that took the most CPU time and analyze the corresponding assembly code:
Example 2: Inline Mode for Hotspots analysis Disabled
When you select the Hide inline functions option on the filter bar for the same sample, the VTune Amplifier does not show the GetModelParams function in the Bottom-up view:
But if you double-click the main function entry and explore the source, you can see that all CPU time is attributed to the code line where the GetModelParams inline function is called:
Example 3: Inline Mode for GPU In-kernel Profiling
By default, the Inline Mode for GPU In-kernel Profiling analysis view is disabled. In this example, 100% of GPU Cycles are attributed to the GPU_FFT_Global function:
Double-clicking the GPU_FFT_Global source function opens the source view positioned on the code line invoking this function with 95.3% of Estimated GPU Cycles attributed to it:
But if you select the Computing Task/Function/Call Stack or Computing Task/Source Function/Call Stack grouping level and enable the Inline Mode for this view, you see that the GPU_FFT_Global function took only 4.7% of the GPU Cycles, while four inline functions took the rest of cycles:
Double-click the hottest GPU_FftIteration function to analyze its source and assembly code: