User Guide

Contents

Run Command Line Analysis

Set Up Environment Variables

Set up the environment variables for the standalone
VTune
Profiler
by executing the
vars
script:
Linux* OS:
source <>/env/vars.sh
Windows* OS:
<>\env\vars.bat
When you run the script, it displays the product name and the build number. You can now use the
vtune
-cl
and
vtune
-gui
commands.

Run Predefined Analysis

The predefined analysis configurations already have most of the
knobs
(configuration options) set by default for your convenience. To run a predefined performance analysis, use the
-collect
action:
vtune
-collect
<> [-target-system=<
system
>] [-knob <
knobName=knobValue
>] [--] <
target
>
where:
  • <analysis_type>
    is the type of analysis to run. To see the list of available analysis types, enter:
    vtune
    -help collect
  • -target-system
    is an option targeted for remote analysis and specifies a remote Linux* system or a Android* device
  • -knob
    is a configuration option that modifies the analysis
  • [
    knobName=knobValue
    ]
    is the name of the specified knob and its value
  • <target>
    is the path and name of the application to analyze. If you need to analyze a process, use the or option to specify the process name or ID. For a system-wide analysis, no target specification is required.
Intel® VTune™
Profiler
supports the following predefined analysis types:
Analysis Type
Description
Analyze application flow and identify sections of code that take a long time to execute (hotspots).
Collect data on how an application is using available logical CPU cores, discover where parallelism is incurring synchronization overhead, identify where an application is waiting on synchronization objects or I/O operations, and discover how waits affect application performance.
Identify opportunities to optimize CPU, memory, and FPU utilization for compute-intensive or throughput applications. The HPC Performance Characterization analysis type is a starting point for understanding the performance landscape of your application. Use this analysis type to improve application performance by increasing the number of floating-point operations per second (GFLOPS) and reducing the overall application run time. The analysis collects data related to CPU, memory, and FPU utilization. Additional scalability metrics are available for applications that use OpenMP* or MPI runtime libraries.
Analyze memory consumption by your Linux application, its distinct memory objects and their allocation stacks.
uarch-exploration
(former
general-exploration
)
Collect hardware events for analyzing a typical client application. This analysis calculates a set of predefined ratios used for the metrics and facilitates identifying hardware-level performance problems.
Identify memory-related issues, like NUMA problems and bandwidth-limited accesses, and attribute performance events to memory objects (data structures), which is provided due to instrumentation of memory allocations/de-allocations and getting static/global variables from symbol information.
sgx-hotspots
(deprecated)
Analyze hotspots inside security enclaves for systems with the Intel Software Guard Extensions (Intel SGX) feature enabled. This analysis type uses the INST_RETIRED.PREC_DIST hardware event that emulates precise clockticks and helps identify performance-critical program units inside enclaves.
tsx-exploration
(deprecated)
Collect events that help understand Intel Transactional Synchronization Extensions (Intel TSX) behavior and causes of transactional aborts.
tsx-hotspots
(deprecated)
Monitor the UOPS_RETIRED.ALL_PS hardware event that emulates precise clockticks and identify performance-critical program units inside transactions.
gpu-hotspots
(preview)
Identify Graphics Processing Unit (GPU) tasks with high GPU utilization and estimate the effectiveness of this utilization. This analysis type is intended for analysis of applications that use a GPU for rendering, video processing, and computations with explicit support of Intel® Media SDK and OpenCL™ software technology.
gpu-offload
(preview)
Explore code execution on various CPU and GPU cores on your platform, correlate CPU and GPU activity, and identify whether your application is GPU or CPU bound.
Analyze the CPU/GPU utilization of your code running on the Xen virtualization platform. Explore GPU usage per GPU engine and GPU hardware metrics that help understand where performance improvements are possible. If applicable, this analysis also detects OpenGL-ES API calls and displays them on the timeline.
(preview)
Analyze the CPU/FPGA interaction issues via exploring OpenCL kernels running on FPGA, identify the most time-consuming FPGA kernels.
Monitor utilization of the IO subsystems, CPU and processor buses. This analysis type uses the hardware event-based sampling collection and system-wide Ftrace* collection (for Linux* and Android* targets)/ETW collection (Windows* targets) to provide a consistent view of the storage sub-system combined with hardware events and an easy-to-use method to match user-level source code with I/O packets executed by the hardware.
This is a PREVIEW FEATURE on Windows* OS. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com or to intelsystemstudio@intel.com.
Evaluate general behavior of Linux* or Android* target systems and correlate power and performance metrics with IRQ handling.

Run Custom Analysis

If you need to run a modified version of the predefined analysis type, you may use the
-collect-with
action option to specify a data collection type and required configuration options (knobs):
vtune
-collect-with <> [-target-system=<
system
>] [-knob <
knobName=knobValue
>] [--] <
target
>
where
  • <collection_type>
    is the type of analysis to run. To see the list of available collection types, enter:
    vtune
    -help collect-with
  • -target-system
    is an option targeted for remote analysis and specifies a remote Linux* system or a Android* device
  • <-knob>
    is an option that configures the analysis
  • [
    knobName=knobValue
    ]
    is the name of specified knob and its value
  • <target>
    is the path and name of the application to analyze. If you need to analyze a process, use the or option to specify the process name or ID. For a system-wide analysis, no target specification is required.
Intel® VTune™
Profiler
supports the following collection types:
Collector
Description
Profile your application using the counter overflow feature of the Performance Monitoring Unit (PMU).
Profile the application execution and take snapshots of how that application utilizes the processors in the system. The collector interrupts a process, collects the value of all active instruction addresses and captures a calling sequence for each of these samples.

Next Steps

When the collection is complete, the
VTune
Profiler
saves the data as an analysis result in the default or specified result directory. You can either view the result in the GUI or generate a formatted analysis report.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804