Get Link
|
Sync TOC
|
<<
|
>>
Search Options:
Search Titles Only
Match All Words
Match Whole Words
Show Results in Tree
Intel® VTune™ Amplifier 2019 User Guide
Introduction
What's New in Intel® VTune™ Amplifier
Tuning Methodology
Tutorials and Samples
Notational Conventions
Getting Help
Product Website and Support
Related Information
Install Intel® VTune™ Amplifier
Sampling Drivers
Driverless Event-Based Sampling Collection
Verify Intel® VTune™ Amplifier Installation on a Linux* System
Platform Profiler Setup (Preview)
Launch Intel® VTune™ Amplifier
Get Started with Intel® VTune™ Amplifier
Standalone VTune Amplifier Graphical Interface
Microsoft Visual Studio* Integration
Eclipse* and Intel System Studio IDE Integration
macOS* Support
Set Up Project
WHERE: Analysis System
Analysis System Options
WHAT: Analysis Target
Analysis Target Options
HOW: Analysis Types
Search Directories
Search Order
Set Up Analysis Target
Windows* Targets
Install the Sampling Drivers for Windows* Targets
Debug Information for Windows* Application Binaries
Compiler Switches for Performance Analysis on Windows* Targets
Debug Information for Windows* System Libraries
Set Up Remote Windows* Target
Add Administrative Privileges
Linux* Targets
Build and Install the Sampling Drivers for Linux* Targets
Compiler Switches for Performance Analysis on Linux* Targets
Debug Information for Linux* Application Binaries
Enable Linux* Kernel Analysis
Analyze Statically Linked Binaries on Linux* Targets
Profile Container Targets
Set Up Remote Linux* Target
Set Up Linux* System for Remote Analysis
Configure SSH Access for Remote Collection
Search Directories for Remote Linux* Targets
Temporary Directory for Performance Results on Linux* Targets
Embedded Linux* Targets
Configure Yocto Project* and Intel® VTune™ Amplifier with the VTune Amplifier Integration Layer
Configure Yocto Project*/Wind River* Linux* and Intel® VTune™ Amplifier with the Intel System Studio Integration Layer
Configure Yocto Project* and Intel® VTune™ Amplifier with the Linux* Target Package
FreeBSD* Targets
Set Up FreeBSD* System
QNX* Targets
Managed Code Targets
.NET* Targets
Windows Store Application Targets
Go* Application Targets
Android* Targets
Build and Install Sampling Drivers for Android* Targets
Set Up Android* System
Enable Java* Analysis on Android* System
Prepare an Android* Application for Analysis
Analyze Unplugged Devices
Search Directories for Android* Targets
Intel® Xeon Phi™ Processor Targets
Targets in Virtualized Environments
Profile Targets on a VMware* Guest System
Profile Targets on a Parallels* Guest System
Profile Targets on a KVM* Guest System
Profile KVM Kernel Modules from the Host
Profile KVM Kernel and User Space on the KVM System
Profile KVM Kernel and User Space from the Host
Profile Targets on a Xen* Virtualization Platform
Profile Targets in the Hyper-V* Environment
Targets in a Cloud Environment
Arbitrary Targets
Analyze Performance
User-Mode Sampling and Tracing Collection
Hardware Event-based Sampling Collection
Allow Multiple Runs or Multiplex Events
Hardware Event-based Sampling Collection with Stacks
Hotspots Analysis Group
Hotspots Analysis for CPU Usage Issues
Hotspots View
Memory Consumption Analysis
Memory Consumption and Allocations View
Parallelism Analysis Group
Threading Analysis
Threading Efficiency View
HPC Performance Characterization Analysis
HPC Performance Characterization View
Microarchitecture Analysis Group
Microarchitecture Exploration Analysis for Hardware Issues
Microarchitecture Exploration View
Microarchitecture Pipe
Memory Access Analysis for Cache Misses and High Bandwidth Issues
Memory Usage View
Platform Analysis Group
CPU/GPU Concurrency Analysis
GPU Compute/Media Hotspots Analysis
Rebuild and Install the Kernel for GPU Analysis
GPU In-kernel Profiling
GPU In-kernel Profiling View
CPU/FPGA Interaction Analysis (Preview)
CPU/FPGA Interaction View
GPU Rendering Analysis (Preview)
Input and Output Analysis
System Disk IO Data View
SPDK IO Data View
System Overview Analysis
Analyze Interrupts
Platform Profiler Analysis (Preview)
Platform Profiler Results (Preview)
Source Code Analysis
Custom Analysis
Custom Analysis Options
Highly Accurate CPU Time Data Collection
Hardware Event List
Hardware Event Skid
Instructions Retired Event
Precise Events
Linux* and Android* Kernel Analysis
Sampling Interval
Sample After Value
Code Profiling Scenarios
Java* Code Analysis
Python* Code Analysis
Intel® Threading Building Blocks Code Analysis
MPI Code Analysis
GPU Application Analysis on Intel® HD Graphics and Intel® Iris® Graphics
GPU OpenCL™ Application Analysis
Intel® Media SDK Program Analysis
Frame Data Analysis
Task Analysis
Analyze Energy
Running Energy Analysis with Intel® VTune™ Amplifier
Viewing Intel Energy Profiler Data Collected Remotely
Interpreting Energy Analysis Data
Control Data Collection
Finalization
Pause Data Collection
Limit Data Collection
Generate Command Line Configuration from GUI
Minimize Collection Overhead
Import External Data
Use a Custom Collector
Create a CSV File with External Data
Import Linux Perf* Trace with VTune Amplifier Metrics
Examples of CSV Format and Imported Data
Manage Data Views
Switch Viewpoints
Control Window Synchronization
View Stacks
Call Stack Mode
Metrics Distribution Over Call Stacks
Manage Grid Views
Manage Timeline View
Change Threshold Values
Choose Data Format
Group and Filter Data
View Data on Inline Functions
Analyze Loops
Stitch Stacks for Intel® Threading Building Blocks or OpenMP* Analysis
Search for Data
Import Results and Traces into VTune Amplifier GUI
Manage Result Files
VTune Amplifier Filenames and Locations
Compare Results
Compare Source Code
View Comparison Data
Comparison Summary
Bottom-up Comparison
Top-down Tree Comparison
Command Line Interface
amplxe-cl Command Syntax
amplxe-cl Actions
Run Command Line Analysis
hotspots
advanced-hotspots
threading
concurrency
locksandwaits
memory-consumption
hpc-performance
uarch-exploration
memory-access
tsx-exploration
tsx-hotspots
sgx-hotspots
cpugpu-concurrency
gpu-hotspots
gpu-profiling
graphics-rendering
fpga-interaction
io
system-overview
runsa/runss Custom Analysis
Configure Analysis Options
Collecting System-Wide Data
Collect Data on Remote Linux Systems
Configure GPU Analysis from Command Line
Specify Search Directories
Specify the Result Directory
Pause Collection
Manage Analysis Duration
Limit Data Collection
Work with Results
View Command Line Results in the GUI
Import Results
Re-finalize Results
Generate Command Line Reports
Summary Report
Hotspots Report
Hardware Events Report
Callstacks Report
Top-down Report
gprof-cc Report
GPU Analysis Report
Difference Report
Profiling Guided Optimization Analysis
Viewing Source Objects
Saving and Formatting Reports
Filtering and Grouping Reports
Command Line Usage Scenarios
Android* Target Analysis from Command Line
OpenMP* Analysis from Command Line
Java* Code Analysis from Command Line
Command Line Interface Reference
Options Descriptions and General Rules
allow-multiple-runs
analyze-kvm-guest
analyze-system
app-working-dir
call-stack-mode
collect
collect-with
column
command
cpu-mask
csv-delimiter
cumulative-threshold-percent
custom-collector
data-limit
discard-raw-data
duration
filter
finalization-mode
finalize
format
group-by
help
import
inline-mode
knob
kvm-guest-kallsyms
kvm-guest-modules
limit
loop-mode
mrte-mode
no-follow-child
no-summary
no-unplugged-mode
quiet
report
report-knob
report-output
report-width
result-dir
resume-after
return-app-exitcode
ring-buffer
search-dir
show-as
sort-asc
sort-desc
source-object
source-search-dir
stack-size
start-paused
strategy
target-install-dir
target-system
target-tmp-dir
target-duration-type
target-pid
target-process
time-filter
trace-mpi
user-data-dir
verbose
version
Reporting Problems from Command Line
API Support
Instrumentation and Tracing Technology APIs
Basic Usage and Configuration
Configuring Your Build System
Attaching ITT APIs to a Launched Application
Instrumenting Your Application
Minimizing ITT API Overhead
Viewing Instrumentation and Tracing Technology (ITT) API Task Data in Intel® VTune™ Amplifier
Instrumentation and Tracing Technology API Reference
Domain API
String Handle API
Collection Control API
Thread Naming API
Task API
Frame API
User-Defined Synchronization API
Event API
Counters API
Load Module API
Memory Allocation APIs
JIT Profiling API
Using JIT Profiling API
JIT Profiling API Reference
iJIT_NotifyEvent
iJIT_IsProfilingActive
iJIT_ GetNewMethodID
System APIs Supported by Intel® VTune™ Amplifier
Troubleshooting
Error Message: Application Sets Its Own Handler for Signal
Error Message: Cannot Enable Event-Based Sampling Collection
Error Message: Cannot Collect GPU Hardware Metrics
Error Message: Cannot Load Data File
Error Message: Cannot Locate Debugging Information
Error Message: Cannot Open Data
Error Message: Client Is Not Authorized to Connect to Server
Error Message: Intel® Graphics Driver Is Obsolete
Error Message: Root Privileges Required for Processor Graphics Events
Error Message: No Pre-built Driver Exists for This System
Error Message: Problem Accessing the Sampling Driver
Error Message: Required Key Not Available
Error Message: Stack Size Is Too Small
Error Message: Symbol File Is Not Found
Problem: Analysis of the .NET* Application Fails
Problem: Cannot Access VTune Amplifier Documentation
Problem: CPU time for Hotspots or Threading Analysis is Too Low
Problem: 'Events= Sample After Value (SAV) * Samples' Is Not True If Multiple Runs Are Disabled
Problem: Guessed Stack Frames
Problem: Inaccurate Sum in the Grid
Problem: Information Collected via ITT API Is Not Available When Attaching to a Process
Problem: No GPU Usage Data Is Collected
Problem: Same Functions Are Compared As Different Instances
Problem: Skipped Stack Frames
Problem: Stack in the Top-Down Tree Window Is Incorrect
Problem: Stacks in Call Stack and Bottom-Up Panes Are Different
Problem: System Functions Appear in the User Functions Only Mode
Problem: VTune Amplifier is Slow to Respond When Collecting or Displaying Data
Problem: VTune Amplifier is Slow on X-Servers with SSH Connection
Problem: Unexpected Paused Time
Problem: {Unknown Timer} in the Platform Power Analysis Viewpoint
Problem: Unknown Critical Error Due to Disabled Loopback Interface
Problem: Unknown Frames
Problem: Unreadable Text in Intel VTune Amplifier on macOS*
Warnings about Accurate CPU Time Collection
Reference
User Interface
Context Menu: Grid
Context Menu: Call Stack Pane
Context Menu: Project Navigator
Context Menu: Source/Assembly Window
Dialog Box: Binary/Symbol Search
Dialog Box: Source Search
Hot Keys
Menu: Customize Grouping
Menu: Intel VTune Amplifier
Pane: Call Stack
Pane: Options - General
Pane: Options - Result Location
Pane: Options - Source/Assembly
Pane: Project Navigator
Pane: Timeline
Toolbar: Command
Toolbar: Filter
Toolbar: Source/Assembly
Toolbar: VTune Amplifier
Window: Bandwidth - Platform Power Analysis
Window: Bottom-up
Window: Caller/Callee
Window: Cannot Find file type File
Window: Collection Log
Window: Compare Results
Window: Configure Analysis
Window: Core Wake-ups - Platform Power Analysis
Window: Correlate Metrics - Platform Power Analysis
Window: CPU C\P States - Platform Power Analysis
Window: Debug
Window: Event Count
Window: Graphics - GPU Hotspots
Window: Graphics - Hotspots
Window: Graphics C/P States - Platform Power Analysis
Window: NC Device States - Platform Power Analysis
Window: Platform
Window: Platform Power Analysis
Window: Sample Count
Window: SC Device States - Platform Power Analysis
Window: Summary
Summary - Disk Input and Output
Summary - Microarchitecture Exploration
Summary - GPU Hotspots
Summary - Hardware Events
Summary - Hotspots
Summary - Hotspots by CPU Usage
Summary - Hotspots by Thread Concurrency
Summary - HPC Performance Characterization
Summary - Locks and Waits
Summary - Memory Consumption
Summary - Memory Usage
Summary - Platform Power Analysis
Window: System Sleep States - Platform Power Analysis
Window: Temperature - Platform Power Analysis
Window: Timer Resolution - Platform Power Analysis
Window: Top-down Tree
Window: Uncore Event Count
Window: Wakelocks - Platform Power Analysis
CPU Metrics
Assists
Available Core Time
Average Bandwidth
Average CPU Frequency
Average CPU Usage
Average Latency (cycles)
Average Logical Core Utilization
Average Physical Core Utilization
Back-End Bound
Memory Bandwidth
Contested Accesses (Intra-Tile)
LLC Miss
UTLB Overhead
Port Utilization
Port 0
Port 1
Port 2
Port 3
Port 4
Port 5
Port 6
Port 7
BACLEARS
Bad Speculation (Cancelled Pipeline Slots)
Bad Speculation (Back-End Bound Pipeline Slots)
FP Arithmetic
FP Assists
FP Scalar
FP Vector
FP x87
MS Assists
Branch Mispredict
Bus Lock
Cache Bound
Clears Resteers
Clockticks per Instructions Retired (CPI)
Clockticks Vs. Pipeline Slots-Based Metrics
CPI Rate
CPI Rate (Intel Atom® processor)
CPU Time
Core Bound
CPU Frequency
CPU Time
CPU Utilization
CPU Utilization (OpenMP)
Cycles of 0 Ports Utilized
Cycles of 1 Port Utilized
Cycles of 2 Ports Utilized
Cycles of 3+ Ports Utilized
DIV Active
DRAM Bandwidth Bound
DRAM Bound
DSB Coverage
DTLB Store Overhead
Effective CPU Utilization
Effective Physical Core Utilization
Effective Time
Elapsed Time
Elapsed Time (Global)
Elapsed Time (Total)
Estimated BB Execution Count
Estimated Ideal Time
Execution Stalls
False Sharing
Far Branch
Flags Merge Stalls
FPU Utilization
% of Packed FP Instructions
% of 128-bit Packed Floating Point Instructions
% of 256-bit Packed Floating Point Instructions
% of Packed SIMD Instructions
% of Scalar FP Instructions
% of Scalar SIMD Instructions
FP Arithmetic/Memory Read Instructions
FP Arithmetic/Memory Write Instructions
Loop Type
SP FLOPs per Cycle
Vector Capacity Usage
Vector Instruction Set
Front-End Bandwidth
Front-End Bandwidth DSB
Front-End Bandwidth LSD
Front-End Bandwidth MITE
Front-End Bound
Branch Resteers
DSB Switches
ICache Misses
ITLB Overhead
Length Changing Prefixes
MS Switches
Front-End Latency
General Retirement
Hardware Event Count
Hardware Event Sample Count
ICache Line Fetch
Ideal Time
Imbalance or Serial Spinning
Inactive Sync Wait Count
Inactive Sync Wait Time
Inactive Time
Inactive Wait Count
Inactive Wait Time
Inactive Wait Time with Poor CPU Utilization
Incoming Bandwidth Bound
Incoming Packet Rate Bound
Instruction Starvation
Interrupt Time
I/O Wait Time
IPC
L1 Bound
4k Aliasing
DTLB Overhead
FB Full
Loads Blocked by Store Forwarding
Lock Latency
Split Loads
L1 Hit Rate
L1D Replacement Percentage
L1D Replacements
L1I Stall Cycles
L2 Bound
L2 Hit Bound
L2 Hit Rate
L2 HW Prefetcher Allocations
L2 Input Requests
L2 Miss Bound
L2 Miss Count
L2 Replacement Percentage
L2 Replacements
L3 Bound
Contested Accesses
Data Sharing
L3 Latency
LLC Hit
SQ Full
LLC Load Misses Serviced By Remote DRAM
LLC Miss Count
LLC Replacement Percentage
LLC Replacements
Local DRAM
Local DRAM Access Count
Local Persistent Memory
Logical Core Utilization
Loop Entry Count
LSD Coverage
Machine Clears
Max DRAM Single-Package Bandwidth
Max DRAM System Bandwidth
MCDRAM Bandwidth Bound
MCDRAM Cache Bandwidth Bound
MCDRAM Flat Bandwidth Bound
Memory Bandwidth
Memory Bound
Memory Efficiency
Memory Latency
Microarchitecture Usage
Microcode Sequencer
Mispredicts Resteers
MO Machine Clear Overhead
MPI Imbalance
MPI Rank on the Critical Path
MS Entry
MUX Reliability
NUMA: % of Remote Accesses
Collection Time
Page Walk
Potential Gain
Imbalance
Lock Contention
OpenMP Region Time
Other
Outgoing Bandwidth Bound
Outgoing Packet Rate Bound
Overhead Time
Parallel Region Time
Paused Time
Pipeline Slots
Pre-Decode Wrong
Remote Cache
Remote Cache Access Count
Remote DRAM
Remote DRAM Access Count
Remote/Local DRAM Ratio
Retire Stalls
Retiring
Self Time and Total Time
Serial CPU Time
MPI Busy Wait Time
Other
Serial Time (Outside Any Parallel Region)
SIMD Assists
SIMD Compute-to-L1 Access Ratio
SIMD Compute-to-L2 Access Ratio
SIMD Instructions per Cycle
Slow LEA Stalls
SMC Machine Clear
SP FLOPs per Cycle
SP GFLOPS
Spin Time
Communication (MPI)
Imbalance or Serial Spinning
(OpenMP) Lock Contention
Other
Overhead Time
(OpenMP) Atomics
(OpenMP) Creation
Other Time
(OpenMP) Reduction
(OpenMP) Scheduling
Tasking
Split Stores
Store Bound
Store Latency
Task Time
Thread Concurrency
Thread Oversubscription
Total Interation Count
uOps
VPU Utilization
Wait Count
Wait Rate
Wait Time
GPU Metrics
Average Time
Computing Threads Started
Computing Threads Started Threads/sec
CPU Time
EU 2 FPU Pipelines Active
EU Array Active
EU Array Idle
EU Array Stalled Idle
EU Array Stalled
EU IPC Rate
EU Send Pipeline Active
EU Threads Occupancy
Global Size
GPU EU Array Usage
GPU L3 Bound
GPU L3 Miss Ratio
GPU L3 Misses
GPU L3 Misses,Transactions/sec
Memory Read Bandwidth
GPU Memory Texture Read Bandwidth
Memory Write Bandwidth
GPU Texel Quads Count
GPU Utilization
GPU Utilization (Max)
Instance Count
L3 Sampler Bandwidth
L3 Shader Bandwidth
LLC Miss Rate due GPU Lookups
LLC Miss Ratio due GPU Lookups
Local Size
Occupancy
PS EU Active %
PS EU Stall %
Ratio to Max Bandwidth
Ratio to Max Bandwidth % (Write)
Ratio to Max Bandwidth % (Read)
Render GPGPU Command Streamer Loaded
Samples Blended
Samples Killed in PS, pixels
Samples Written
Sampler Busy
Sampler Is Bottleneck
Shared Local Memory Read Bandwidth
Shared Local Memory Write Bandwidth
SIMD Width
Data Transferred Size
Data Transferred Bandwidth
Total Time
Typed Memory Read Transactions
Typed Memory Write Transactions
Typed Reads Coalescence
Typed Writes Coalescence
Untyped Memory Read Bandwidth
Untyped Memory Write Bandwidth
Untyped Reads Coalescence
Untyped Writes Coalescence
VS EU Active
VS EU Stall
OpenCL Kernel Analysis Metrics
Computing Task Total Time
Instance Count
SIMD Width
Work Size
Energy Analysis Metrics
Available Core Time
C-State
D0ix States
DRAM Self Refresh
Energy Consumed
Idle Wake-ups
P-State
S0ix States
Temperature
Timer Resolution
Total Time in C0 State
Total Time in Non-C0 States
Total Time in Non-S0 States
Total Wake-up Count
Wake-ups
Wake-ups/sec per Core
Intel Processor Events
Legal Information