- Type of performance analysis. The following analysis types and configurable knobs are supported:
- Identify performance anomalies in frequently recurring intervals of code like loop iterations. Perform fine-grained analysis at the microsecond level.
Collection type: user-mode sampling and tracing collection or hardware event-based sampling.
- -knob ipt-regions-to-loadto specify the maximum number (10-5000) of code regions to load for detailed analysis. To load details efficiently, maintain this number at or below 1000.
- -knob max-region-durationto specify the maximum duration (0.001-1000ms) of analysis per code region.
- Identify your most time-consuming source code using one of the available collection modes:
Collection type: user-mode sampling and tracing collection or hardware event-based sampling.Knobs:enable-characterization-insights,enable-stack-collection,sampling-interval,sampling-mode.
- -knob sampling-mode=sw(former Basic Hotspots) to collect hotspots and stack information based on the user-mode sampling and tracing, which does not required sampling drivers but incurs higher collection overhead). This mode cannot be used to profile a system, but must either launch an application/process or attach to one.
- -knob sampling-mode=hw(former Advanced Hotspots) to sample all processes on the system and identify hotspots.
- Analyze how your application is using available logical CPU cores, discover where parallelism is incurring synchronization overhead, find how waits affect your application's performance, and identify potential candidates for parallelization.Collection type: user-mode sampling and tracing collection.Knobs:sampling-interval.
- Analyze memory consumption by your Linux application, its distinct memory objects and their allocation stacks.Collection type: user-mode sampling and tracing collection.Knobs:mem-object-size-min-thres.
- Identify opportunities to optimize CPU, memory, and FPU utilization for compute-intensive or throughput applications.Collection type: hardware event-based sampling collection.Knobs:enable-stack-collection,collect-memory-bandwidth,sampling-interval,dram-bandwidth-limits.
- uarch-exploration(formely known asgeneral-exploration)
- Identify and locate the most significant hardware issues that affect the performance of your application. Use this analysis type as a starting point for microarchitecture analysis.Collection type: hardware event-based sampling collection.Knobs:enable-stack-collection,collect-memory-bandwidth,enable-user-tasks.
- Measure a set of metrics to identify memory access related issues (for example, specific for NUMA architectures).Collection type: hardware event-based sampling collection.Knobs:sampling-interval,dram-bandwidth-limits,analyze-openmp; Linux only:analyze-mem-objects,mem-object-size-min-thres.
- Analyze hotspots inside security enclaves for systems with the Intel Software Guard Extensions (Intel SGX) feature enabled.Collection type: hardware event-based sampling collection.Knobs:enable-stack-collection,enable-user-tasks.
- Analyze Intel Transactional Synchronization Extensions (Intel TSX) usage.Collection type: hardware event-based sampling collection.Knobs:enable-user-tasks,analysis-step.
- Analyze hotspots inside transactions.Knobs:enable-user-tasks,enable-stack-collection.
- Enable the CPU/GPU Concurrency analysis and explore code execution on the various CPU and GPU cores in your system, correlate CPU and GPU activity and identify whether your application is GPU or CPU bound.Knobs:sampling-interval,enable-user-tasks,enable-user-sync,enable-gpu-usage,gpu-counters-mode,enable-gpu-runtimes.
- Identify GPU tasks with high GPU utilization and estimate the effectiveness of this utilization.Collection type: hardware event-based sampling collection.Knobs:gpu-sampling-interval,enable-gpu-usage,gpu-counters-mode,enable-gpu-runtimes,enable-stack-collection.
- Analyze GPU kernel execution per code line and identify performance issues caused by memory latency or inefficient kernel algorithms.Collection type: hardware event-based sampling collection.Knobs:gpu-profiling-mode,kernels-to-profile.
- Analyze the CPU/GPU utilization of your code running on the Xen virtualization platform. Explore GPU usage per GPU engine and GPU hardware metrics that help understand where performance improvements are possible. If applicable, this analysis also detects OpenGL-ES API calls and displays them on the timeline.Collection type: hardware event-based sampling collection.Knobs:gpu-sampling-interval,gpu-counters-mode.
- Analyze the CPU/FPGA interaction issues via exploring OpenCL kernels running on FPGA, identify the most time-consuming FPGA kernels.Collection type: hardware event-based sampling collection.Knobs:sampling-interval,enable-stack-collection.
- Monitor utilization of the IO subsystems, CPU and processor buses.Collection type: hardware event-based sampling collection.Knobs:kernel-stack,collect-memory-bandwidth,dram-bandwidth-limits; Linux only:dpdk,spdk.
- Evaluate general behavior of Linux* or Android* target systems and correlate power and performance metrics with IRQ handling.Collection type: hardware event-based sampling collection.Knobs:collection-detail.
- Thecommand runs no data collection unless the collect action is specified.vtune
- Thecollect-withaction performs the same basic functions as thecollectaction, but provides additional knob settings for custom configuration.
vtune -collect hotspots -knob sampling-mode=hw -- /home/test/sample
vtune -collect hs -search-dir /home/import/system_modules -- /home/test/sample
vtune -collect hotspots -target-pid 1234
vtune -collect threading -no-auto-finalize -- /home/test/sample