User Guide

Contents

Introduction

Intel® VTune™
Profiler
is a performance analysis tool for users who develop serial and multithreaded applications.
VTune
Profiler
helps you analyze the algorithm choices and identify where and how your application can benefit from available hardware resources.
Intel® VTune™ Amplifier has been renamed to Intel® VTune™ Profiler starting with its version for Intel® oneAPI Base Toolkit (Beta). You can still use a standalone version of the VTune Profiler, or its versions integrated into Intel Parallel Studio XE or Intel System Studio.
Use
VTune
Profiler
to locate or determine:
  • The most time-consuming (hot) functions in your application and/or on the whole system
  • Sections of code that do not effectively utilize available processor time
  • The best sections of code to optimize for sequential performance and for threaded performance
  • Synchronization objects that affect the application performance
  • Whether, where, and why your application spends time on input/output operations
  • Whether your application is CPU or GPU bound and how effectively it offloads code to the GPU
  • The performance impact of different synchronization methods, different numbers of threads, or different algorithms
  • Thread activity and transitions
  • Hardware-related issues in your code such as data sharing, cache misses, branch misprediction, and others
You can install
VTune
Profiler
on Windows*, macOS*, and Linux* platforms and use the application for analysis of local and remote target systems. You can also use
VTune
Profiler
as a web server, which is an optimal solution for multi-user environments.
VTune
Profiler
can be integrated into IDEs, such as Microsoft Visual Studio* or Eclipse*, or used as a standalone GUI client.
On macOS, you can set up your project, run remote analysis, and view the data collection result on the host. Local macOS analysis is not supported. On all supported systems, you can use the command line interface (
vtune
) for collecting data and performing regression testing.

Key Features

This table summarizes the availability of important analysis types per host and remote target platform using
VTune
Profiler
:
¹Preview only; ²Intel HD Graphics and Intel Iris® Graphics only; ³EBS analysis only
VTune
Profiler
provides features that facilitate the analysis and interpretation of the results:
  • Top-down tree analysis: Use to understand which execution flow in your application is more performance-critical.
  • Timeline analysis: Analyze thread activity and the transitions between threads.
  • ITT API analysis: Use the ITT API to mark significant transition points in your code and analyze performance per frame, task, and so on.
  • Architecture diagram: Analyze GPU OpenCL™ applications by exploring the GPU hardware metrics per GPU architecture blocks.
  • Source analysis: View source with performance data attributed per source line to explore possible causes of an issue.
  • Comparison analysis: Compare performance analysis results for several application runs to localize the performance changes you got after optimization.
  • Start data collection paused mode: Click the
    Start Paused
    button on the command bar to start the application without collecting performance data and click the
    Resume
    button to enable the collection at the right moment.
  • Grouping: Group your data by different granularity in the grid view to analyze the problem from different angles.
  • Viewpoints: Choose among preset configurations of windows and panes available for the analysis result. This helps focus on particular performance problems.
  • Hot keys to start and stop the analysis: Use a batch file to create hot keys to start and stop a particular analysis.
Because
VTune
Profiler
requires specific knowledge of assembly-level instructions, its analysis may not operate correctly if a program (target) is compiled to generate non-Intel architecture instructions. In this case, run the analysis with a target executable compiled to generate only Intel instructions. After you finish using
VTune
Profiler
, you can use optimizing compiler options that generate non-Intel architecture instructions.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804