User Guide

Contents

Hotspots Report

Use the
hotspots
command line report to identify program units (for example: functions, modules, or objects) that take the most processor time (Hotspots analysis), underutilize available CPUs or have long waits (Threading analysis), and so on.
Use the
hotspots
report to view hottest GPU computing tasks (or their instances) identified with the
gpu-hotspots
or analysis.
The report displays the hottest program units in the descending order by default, starting from the most performance-critical unit. The command-line reports provide the same data that is displayed in the default GUI analysis viewpoint.
To display a list of available groupings for a Hotspots report, enter
vtune
-report hotspots -r <
result_dir
> group-by=?
. If you do not specify a result directory, the latest result is used by default.
Examples
Example 1: Hotspots Report with Module Grouping
This example opens the Hotspots report for the
r001hs
Hotspots analysis result and groups the data by module.
vtune
-report hotspots -r r001hs -group-by module
Module CPU Time ----------------- -------- analyze_locks 10.080s KERNELBASE 0.679s ntdl 0.164s ...
Example 2: Hotspots Report with Limited Items
This example displays the Hotspots report for the r001hs analysis result including only the top two functions with the highest CPU Time values. Functions having insignificant impact on performance are excluded from output.
vtune
-report hotspots -r r001hs -limit 2
Function CPU Time ---------------- -------- grid_intersect 5.489s sphere_intersect 3.590s
Example 3: Report per OpenCL Kernels
This example shows how to view the collected data per OpenCL kernels submitted and executed on the GPU:
vtune
-report hotspots -group-by=computing-task -r r000gh
Computing Task Work Size:Global Computing Task:Total Time Data Transferred:Size EU Array:Active(%) L3 <-> GTI Total Bandwidth, GB/sec ------------------- ---------------- ------------------------- --------------------- ------------------ ---------------------------------- AdvancePaths 65536 13.170s 25.0% 22.928 Init 65536 0.006s 34.4% 45.802 Intersect 65536 49.139s 61.5% 23.149 Sampler 65536 6.525s 76.4% 11.745 InitFrameBuffer 362432 0.000s 4.7% 17.456 clEnqueueReadBuffer 1.045s 3 GB 1.5% 8.840
Example 4: Report Grouped per DPC++ Task Instances
This example filters and groups the collected data by DPC++ task instances:
vtune
-report hotspots -group-by=computing-instance -r r000gh
Computing Task Instance Work Size:Global Computing Task:Total Time Data Transferred:Size GPU Time ------------------- ------------------ ---------------- ------------------------- --------------------- -------- CopyVector2 2 6553600 0.190s 0.190s clEnqueueReadBuffer 1 0.034s 400 MB 0.034s

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804