User Guide

Contents

Callstacks Report

Intel® VTune™
Profiler
collects call stack information during User-Mode Sampling and Tracing Collection or Hardware Event-based Sampling Collection with stack collection enabled. Use the
callstacks
report to see how the hot functions are called. This report type focuses on call sequences, beginning from the functions that take most CPU time.
You can use the
-column
option to filter the callstacks report and focus on the specific metric, for example:
vtune
-report -callstacks -r r001ah -column="CPI Rate"
To display a list of columns available for callstacks report, enter:
vtune
-report callstacks -r <
result_dir
> column=?
Example 1: Callstacks Report with Limited Items
The following example generates a
callstacks
report for the most recent analysis result and limits the number of functions and function stacks to 5 items.
vtune
-report callstacks -limit 5
On Windows*:
Function Function Stack CPU Time Module Function (Full) Source File Start Address -------------- ----------------- -------- ----------------- ------------------------------- ----------------- ------------- grid_intersect 5.436s analyze_locks.exe grid_intersect grid.cpp 0x40d340 intersect_objects 1.918s analyze_locks.exe intersect_objects(struct ray *) intersect.cpp 0x402840 shader 0s analyze_locks.exe shader(struct ray *) shade.cpp 0x404730 trace 0s analyze_locks.exe trace(struct ray *) trace_rest.cpp 0x402370 render_one_pixel 0s analyze_locks.exe render_one_pixel analyze_locks.cpp 0x401db0 ...
On Linux*:
Function Function Stack CPU Time Module Function (Full) Source File Start Address -------------------- ----------------- -------- --------------------- ------------------------ ----------------- ------------- initialize_2D_buffer 22.746s tachyon_find_hotspots initialize_2D_buffer find_hotspots.cpp 0x4018f0 render_one_pixel 22.746s tachyon_find_hotspots render_one_pixel find_hotspots.cpp 0x401950 draw_trace 0s tachyon_find_hotspots draw_trace(void) find_hotspots.cpp 0x401d70 thread_trace 0s tachyon_find_hotspots thread_trace(thr_parms*) find_hotspots.cpp 0x401ef0 trace_shm 0s tachyon_find_hotspots trace_shm trace_rest.cpp 0x410a20 trace_region 0s tachyon_find_hotspots trace_region trace_rest.cpp 0x410aa0 rt_renderscene 0s tachyon_find_hotspots rt_renderscene(void*) api.cpp 0x402360 tachyon_video 0s tachyon_find_hotspots tachyon_video video.cpp 0x402240 main 0s tachyon_find_hotspots main video.cpp 0x4013e0 __libc_start_main 0s libc.so.6 __libc_start_main libc-start.c 0x21dd0 _start 0s tachyon_find_hotspots _start [Unknown] 0x40149c grid_intersect 7.282s tachyon_find_hotspots grid_intersect grid.cpp 0x408930 intersect_objects 2.756s tachyon_find_hotspots intersect_objects(ray*) intersect.cpp 0x40a400 shader 0s tachyon_find_hotspots shader(ray*) shade.cpp 0x40eae0 ...
Example 2: Callstacks Report with Callstack Grouping
This example generates a
callstacks
report for the
r001tr
result that is grouped by function call stacks.
vtune
-report callstacks -r r001tr -group-by callstack
On Windows*:
Function/Function Stack Wait Time Module Function (Full) ----------------------------------------- --------- ----------------- ----------------------------------------- tbb::internal::acquire_binsem_using_event 20.005s tbb.dll tbb::internal::acquire_binsem_using_event func@0x10003350 13.857s gdiplus.dll func@0x10003350 func@0x1000c1f0 0s gdiplus.dll func@0x1000c1f0 BaseThreadInitThunk 0s KERNEL32.DLL BaseThreadInitThunk func@0x6b2dacf0 0s ntdll.dll func@0x6b2dacf0 func@0x6b2daccf 0s ntdll.dll func@0x6b2daccf video::main_loop 10.111s analyze_locks.exe video::main_loop(void) main 0s analyze_locks.exe main WinMain 0s analyze_locks.exe WinMain _tmainCRTStartup 0s analyze_locks.exe _tmainCRTStartup [Unknown stack frame(s)] 0s [Unknown] [Unknown stack frame(s)] BaseThreadInitThunk 0s KERNEL32.DLL BaseThreadInitThunk func@0x6b2dacf0 0s ntdll.dll func@0x6b2dacf0 ...
On Linux*:
Function/Function Stack Wait Time Module Function (Full) ------------------------------- --------- --------------------- ----------------------------------------------------------- draw_task::operator() 98.698s tachyon_analyze_locks draw_task::operator()(tbb::blocked_range<int> const&) const tbb::interface6::internal 0s tachyon_analyze_locks tbb::interface6::internal execute<tbb::interface6::internal 0s tachyon_analyze_locks execute::interface6::internal [TBB parallel_for on draw_task] 0s tachyon_analyze_locks tbb::interface6::internal::execute(void) [TBB Dispatch Loop] 0s libtbb.so.2 tbb::internal::local_wait_for_all(tbb::task&, tbb::task*) ...

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804