User Guide

Contents

Run Command Line Analysis

Set Up Environment Variables

Set up the environment variables for the standalone
VTune
Profiler
by executing the
vars
script:
Linux* OS:
source <>/env/vars.sh
Windows* OS:
<>\env\vars.bat
When you run the script, it displays the product name and the build number. You can now use the
vtune
and
vtune
-gui
commands.

Run Predefined Analysis

The predefined analysis configurations already have most of the
knobs
(configuration options) set by default for your convenience. To run a predefined performance analysis, use the
-collect
action:
vtune
-collect
<> [-target-system=<
system
>] [-knob <
knobName=knobValue
>] [--] <
target
>
where:
  • <analysis_type>
    is the type of analysis to run. To see the list of available analysis types, enter:
    vtune
    -help collect
  • -target-system
    is an option targeted for remote analysis and specifies a remote Linux* system or a Android* device
  • -knob
    is a configuration option that modifies the analysis
  • [
    knobName=knobValue
    ]
    is the name of the specified knob and its value
  • <target>
    is the path and name of the application to analyze. If you need to analyze a process, use the or option to specify the process name or ID. For a system-wide analysis, no target specification is required.
Intel® VTune™
Profiler
supports the following predefined analysis types:
Analysis Type
Description
Get an overview of issues that affect application performance on your target system.
Analyze application flow and identify sections of code that take a long time to execute (hotspots).
Identify performance anomalies in frequently recurring intervals of code like loop iterations. Perform fine-grained analysis at the microsecond level.
Collect data on how an application is using available logical CPU cores, discover where parallelism is incurring synchronization overhead, identify where an application is waiting on synchronization objects or I/O operations, and discover how waits affect application performance.
Identify opportunities to optimize CPU, memory, and FPU utilization for compute-intensive or throughput applications. The HPC Performance Characterization analysis type is a starting point for understanding the performance landscape of your application. Use this analysis type to improve application performance by increasing the number of floating-point operations per second (GFLOPS) and reducing the overall application run time. The analysis collects data related to CPU, memory, and FPU utilization. Additional scalability metrics are available for applications that use OpenMP* or MPI runtime libraries.
Analyze memory consumption by your Linux application, its distinct memory objects and their allocation stacks.
uarch-exploration
(former
general-exploration
)
Collect hardware events for analyzing a typical client application. This analysis calculates a set of predefined ratios used for the metrics and facilitates identifying hardware-level performance problems.
Identify memory-related issues, like NUMA problems and bandwidth-limited accesses, and attribute performance events to memory objects (data structures), which is provided due to instrumentation of memory allocations/de-allocations and getting static/global variables from symbol information.
sgx-hotspots
(deprecated)
Analyze hotspots inside security enclaves for systems with the Intel® Software Guard Extensions (Intel® SGX) feature enabled. This analysis type uses the INST_RETIRED.PREC_DIST hardware event that emulates precise clockticks and helps identify performance-critical program units inside enclaves.
tsx-exploration
(deprecated)
Collect events that help understand Intel® Transactional Synchronization Extensions (Intel® TSX) behavior and causes of transactional aborts.
tsx-hotspots
(deprecated)
Monitor the UOPS_RETIRED.ALL_PS hardware event that emulates precise clockticks and identify performance-critical program units inside transactions.
gpu-hotspots
(preview)
Identify Graphics Processing Unit (GPU) tasks with high GPU utilization and estimate the effectiveness of this utilization. This analysis type is intended for analysis of applications that use a GPU for rendering, video processing, and computations with explicit support of Intel® Media SDK and OpenCL™ software technology.
Explore code execution on various CPU and GPU cores on your platform, correlate CPU and GPU activity, and identify whether your application is GPU or CPU bound.
Get a holistic view of system behavior. Gain insights into platform-level configuration, utilization, and imbalance issues that relate to compute, memory, storage, IO and interconnects.
Analyze the CPU/GPU utilization of your code running on the Xen virtualization platform. Explore GPU usage per GPU engine and GPU hardware metrics that help understand where performance improvements are possible. If applicable, this analysis also detects OpenGL-ES API calls and displays them on the timeline.
Analyze the CPU/FPGA interaction issues via exploring OpenCL kernels running on FPGA, identify the most time-consuming FPGA kernels.
Monitor utilization of the IO subsystems, CPU and processor buses. This analysis type uses the hardware event-based sampling collection and system-wide Ftrace* collection (for Linux* and Android* targets)/ETW collection (Windows* targets) to provide a consistent view of the storage sub-system combined with hardware events and an easy-to-use method to match user-level source code with I/O packets executed by the hardware.
Monitor a general behavior of your target system and identify platform-level factors that limit performance.

Run Custom Analysis

If you need to run a modified version of the predefined analysis type, you may use the
-collect-with
action option to specify a data collection type and required configuration options (knobs):
vtune
-collect-with <> [-target-system=<
system
>] [-knob <
knobName=knobValue
>] [--] <
target
>
where
  • <collection_type>
    is the type of analysis to run. To see the list of available collection types, enter:
    vtune
    -help collect-with
  • -target-system
    is an option targeted for remote analysis and specifies a remote Linux* system or a Android* device
  • <-knob>
    is an option that configures the analysis
  • [
    knobName=knobValue
    ]
    is the name of specified knob and its value
  • <target>
    is the path and name of the application to analyze. If you need to analyze a process, use the or option to specify the process name or ID. For a system-wide analysis, no target specification is required.
Intel® VTune™
Profiler
supports the following collection types:
Collector
Description
Profile your application using the counter overflow feature of the Performance Monitoring Unit (PMU).
Profile the application execution and take snapshots of how that application utilizes the processors in the system. The collector interrupts a process, collects the value of all active instruction addresses and captures a calling sequence for each of these samples.

Next Steps

When the collection is complete, the
VTune
Profiler
saves the data as an analysis result in the default or specified result directory. You can either view the result in the GUI or generate a formatted analysis report.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.