User Guide

Contents

Switch Viewpoints

Use a viewpoint, a pre-set configuration of
Intel® VTune™
Profiler
's data views, to focus on specific performance problems.
When you select a viewpoint, you select a set of performance metrics the
Intel® VTune™
Profiler
shows in the windows of the result tab. To select the required viewpoint, click the down arrow:
VTune 
		   ProfilerViewpoint
Name of the analysis type you ran.
Name of the current viewpoint. Click the down arrow next to the viewpoint name to open a drop-down menu with a choice of applicable viewpoints.
Context-sensitive help icon for the current viewpoint.
Viewpoint drop-down menu that displays a list of viewpoints available for the current analysis type.
Explore the table below to understand which viewpoints are available for each analysis type:
Viewpoint
Description
Hotspots by CPU Utilization
Helps identify
hotspots
- code regions in the application that consume a lot of CPU time. CPU time is broken down into CPU utilization states: idle, poor, fair, and good.
Threading Efficiency
Shows how your multi-threaded application is utilizing available CPU cores and helps identify the possible causes of ineffective utilization. Use this view to find threads waiting too long on synchronization objects (locks) or identify scheduling overhead.
Microarchitecture Exploration
Helps identify where the application is not making the best use of available hardware resources. This viewpoint displays metrics derived from hardware events. The
Summary
window reports overall metrics for the entire execution along with explanations of the metrics. From the
Bottom-up
and
Top-down Tree
windows you can locate the hardware issues in your application. Cells are highlighted when potential opportunities to improve performance are detected. Hover over the highlighted metrics in the grid to see explanations of the issues.
Hardware Events
Displays statistics of monitored hardware events: estimated count and/or the number of samples collected. Use this view to identify code regions (modules, functions, code lines, and so on) with the highest activity for an event of interest.
Memory Usage
Helps understand how effectively your application uses memory resources and identify potential memory access related issues like excessive access to remote memory on NUMA platforms, hitting DRAM or Interconnect bandwidth limit, and others. It provides various performance metrics for both the application code and memory objects arrays.
HPC Performance Characterization
Helps understand how effectively your application uses CPU, memory, and floating-point operation resources. Use this view to identify scalability issues for Intel OpenMP and MPI runtimes as well as next steps to increase memory and FPU efficiency.
Input and Output
Shows input/output data, CPU and bus utilization statistics correlated with the execution of your target. Use this view to identify long latency of I/O requests, explore call stacks for I/O functions, analyze slow I/O requests on the timeline and identify imbalance between I/O and compute operations.
GPU Compute/Media Hotspots
Helps identify GPU tasks with high GPU utilization and estimate its effectiveness. It is particularly useful for DPC++ computing tasks, analysis of the OpenCL™ kernels and Intel Media SDK tasks. Use this view to identify the most time-consuming GPU computing tasks, analyze GPU tasks execution over time, explore the GPU hardware metrics per GPU architecture blocks, and so on.
FPGA Hotspots
Helps identify the FPGA and CPU tasks with high utilization. Use this view to assess FPGA time spent executing kernels, overall time for memory transfers between the CPU and FPGA, and how well a workload is balanced between the CPU and FPGA.
GPU Rendering
Provides platform-wide CPU/GPU utilization and efficiency statistics collected with GPU Rendering analysis (preview) including dedicated support for the Xen virtualization platform.
Platform Power Analysis
Helps identify where the application is generating idle and wake-up behavior that can lead to inefficient use of energy. Where possible, it provides data from both the OS and hardware perspective, such as the detailed C-state residency report that shows the OS requested time in deep sleep states compared to the actual residency the hardware indicated.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804