knob

Set configuration options for the specified analysis type or collector type.

GUI Equivalent

Configure Analysiswindow > HOW pane

Syntax

-knob | -k <knob-name>=<knob-value>

Arguments

knob-name

An analysis type or collector type may have one or more configuration options (knobs) that provide additional instructions for performing the specified type of analysis. To use a knob, you must specify the knob name and knob value.

Multiple knob options are allowed and can be followed by additional action-options, as well as global-options, if needed.

knob-value

There are values available for each knob. In most cases this is a Boolean value, so for Boolean knobs, specify <knob-name>=true to enable the knob.

Note

Knob behavior may vary depending on the analysis type or collector type.

<knob-name>

Description

accurate-cpu-time-detection=true | false (Windows only)

Default: true

Collect more accurate CPU time data. This option requires additional disk space and post-processing time. Administrator privileges are required.

Supported analysis: runss

analyze-loops=true | false

Default: false

Extend loop analysis to collect advanced loops information such as instruction set usage and display analysis results by loops and functions.

Supported analysis: runss, runsa

analyze-mem-objects=true | false

Default: false

Enable the instrumentation of memory allocation/de-allocation and map hardware events to memory objects. This option is supported for Linux targets only running on the Intel microarchitecture code name Sandy Bridge (or later).

Supported analysis: memory-access

analyze-openmp=true | false

Default: true for the HPC Performance Characterization analysis; false for other analysis types.

Instrument the OpenMP* runtimes in your application to group performance data by regions/work-sharing constructs and detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction, and atomic operations. Using this option may cause higher overhead and increase the result size.

Supported analysis: hotspots, threading, hpc-performance, memory-access, uarch-exploration, runsa

chipset-event-config="event1,event2 ,..."

Specify a comma-separated list of Android chipset events (up to 5 events) to monitor with the hardware event-based sampling collector.

Supported analysis: runsa

collect-bad-speculation=true | false

Default value: true

Collect the minimum set of data required to compute top-level metrics and all Bad Speculation sub-metrics.

Supported analysis: uarch-exploration, runsa

collect-core-bound=true | false

Default: false

Collect the minimum set of data required to compute top-level metrics and all Core Bound sub-metrics.

Supported analysis: uarch-exploration, runsa

collect-frontend-bound=true | false

Default value: true

Collect the minimum set of data required to compute top-level metrics and all Front-End Bound sub-metrics.

Supported analysis: uarch-exploration, runsa

collect-io-waits=true | false

Default: false

Analyze the percentage of time each thread and CPU spends in I/O wait state.

Supported analysis: runsa

collect-memory-bandwidth=true | false

Default: depends on analysis type

Collect data to identify where your application is generating significant bandwidth to DRAM. To view collected data in GUI, enable the Analyze memory bandwidth option.

Supported analysis: uarch-exploration, hpc-performance, runsa

collect-memory-bound=true | false

Default value: true

Collect the minimum set of data required to compute top-level metrics and all Memory Bound sub-metrics.

Supported analysis: uarch-exploration, hpc-performance

collect-retiring=true | false

Default value: true

Collect the minimum set of data required to compute top-level metrics and all Retiring sub-metrics.

Supported analysis: uarch-exploration, runsa

counting-mode=true | false

Default: false

Choose between collecting detailed context data for each PMU event (such as code or hardware context) or the counts of events. Counting mode introduces less overhead but gives less information.

Supported analysis: runsa

cpu-samples-mode=off | stack | nostack

Default: false

Enable to periodically sample the application. Samples can be collected with or without stacks.

Supported analysis: runss

dpdk=true | false

Default: false

Profile DPDK IO API.

Supported analysis: io

dram-bandwidth-limits=true | false

Default: true for the HPC Performance Characterization and Microarchitecture Exploration analysis with collect-memory-bandwidth knob enabled; true for the Memory Access and Microarchitecture Exploration analysis.

Evaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds.

Supported analysis: memory-access, uarch- exploration, hpc-performance, runsa

enable-context-switches=true | false

Default: false

Analyze detailed scheduling layout for all threads in your application, explore time spent on a context switch and identify the nature of context switches for a thread (preemption or synchronization).

Supported analysis: runsa

enable-driverless-collection=true | false

Default: false

Enable driverless Linux Perf collection when possible.

Supported analysis: runsa

enable-gpu-runtimes=true | false

Default for gpu-hotspots: true, for runss: false.

Analyze execution of OpenCL™ kernels and Intel® Media SDK programs on Intel HD Graphics and Intel® Iris® Graphics. This option may affect the performance of your application on the CPU side.

Supported analysis: gpu-hotspots, runss, runsa

Note

OpenCL kernels analysis is currently supported for Windows and Linux target systems with Intel HD Graphics and Intel Iris Graphics. Intel Media SDK program analysis is supported for Linux targets only.

enable-gpu-usage=true | false

Default: false

Analyze frame rate and usage of Intel HD Graphics and Intel® Iris® Graphics engines and identify whether your application is GPU or CPU bound.

Supported analysis: runss, runsa

enable-parallel-fs-collection=true | false

Default: false

Analyze Lustre* file system performance statistics, including Bandwidth, Package Rate, Average Packet Size, and others.

Supported analysis: runsa

enable-stack-collection=true | false

Default: false

Enable Hardware Event-based Sampling Collection with Stacks.

Supported analysis: hotspots, hpc-performance, gpu-hotspots, runsa

enable-system-cswitch=true | false

Default: false

Analyze detailed scheduling layout for all threads on the system and identify the nature of context switches for a thread (preemption or synchronization).

Supported analysis: runsa

enable-thread-affinity=true | false

Default: false

Analyze thread pinning to sockets, physical cores, and logical cores. Identify incorrect affinity that utilizes logical cores instead of physical cores and contributes to poor physical CPU utilization.

Note

Affinity information is collected at the end of the thread lifetime, so the resulting data may not show the whole issue for dynamic affinity that is changed during the thread lifetime.

enable-user-sync=true | false

Default: false

Collect synchronization data via the User-Defined Synchronization API.

Supported analysis: threading, runss

enable-user-tasks=true | false

Default: false

Analyze tasks, events and counters specified in your application via the Task API. This option causes higher overhead and increases result size.

Supported analysis: hotspots, threading, ,runss, uarch-exploration, runss, runsa

event-config=<event_name1>,<event_name2>,...

Configure PMU events to collect with the hardware event-based sampling collector. Multiple events can be specified as a comma-separated list (no spaces).

Note

To display a list of events available on the target PMU, enter:

$ amplxe-cl -collect-with runsa -knob event-config=? <target>

The command returns names and short descriptions of available events. For more information on the events, use Intel Processor Events Reference.

Supported analysis: runsa

event-mode=all | user | os

Default: all

Limit event-based sampling collection to OS or USER mode.

Supported analysis: hotspots, runsa

ftrace-config=<event_name>

Available events are freq, idle, sched, disk, filesystem, irq, kvm, workq, softirq, sync.

Default for Linux targets: sched,freq,idle,workq,irq,softirq

Default for Android targets: sched,freq,idle,workq,filesystem, irq,softirq,sync,disk

Collect Linux Ftrace* framework events.

Supported analysis: runsa, runss

gpu-profiling-mode=bblatency (default), memlatency

Select a profiling mode to identify basic blocks latency due to algorithm inefficiencies, or memory latency due to memory access issues.

Supported analysis: gpu-profiling, runsa

gpu-sampling-interval=<number> between 0.1 and 1000ms

Default: 1.

Specify an interval between GPU samples (in milliseconds).

Supported analysis: gpu-hotspots, graphics-rendering, runss, runsa

gpu-counters-mode= none (default for runss), overview (default for gpu-hotspots), global-local-accesses, compute-extended, full-compute, render-basic (default for graphics-rendering)

Analyze performance data from Intel HD Graphics and Intel Iris Graphics based on the preset counter sets.

Supported analysis: gpu-hotspots, graphics-rendering, gpu-profiling, runss, runsa

io-mode=off | stack | nostack

Default: off

Enable to identify where threads are waiting or compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size.

Supported analysis: runss, runsa

kernel-stack=true | false

Default: true

Profile system disk IO API.

Supported analysis: io

kernels-to-profile=kernel:1:1:4294967293

Specify a comma-separated list of GPU kernel names and invocations in the following format:

kernel_name[:start_idx:step:stop_idx]

where kernel_name is the name of GPU kernel; start_idx is the number of the first invocation; and stop_idx is the number of the last invocation to be profiled.

Supported analysis: gpu-profiling, runsa

mem-object-size-min-thres=<number>

Default: 1024 bytes

Specify a minimal size of memory allocations to analyze. This option helps reduce runtime overhead of the instrumentation.

This option is supported for Linux targets only running on the Intel microarchitecture code name Sandy Bridge (or later).

Supported analysis: memory-access

mrte-type=java,dotnet | java,dotnet,python | python

Default: java,dotnet

Specify a type of managed runtime to analyze. Available values: combined .NET* and Java* analysis, combined Java, .NET and Python* analysis, and Python only.

Supported analysis: runss, runsa

no-altstack=true | false

Default: false

Disable using alternative stacks for signal handlers. Consider this option for profiling standard Python 3 code on Linux.

Supported analysis: runss

pmu-collection-mode=detailed | summary

Default: detailed

Choose the detailed sampling-based collection mode to view data breakdown per function and other hotspots. Use the summary counting-based mode for an overview of the whole profiling run. This mode has a lower collection overhead and fast post-processing time.

Supported analysis: uarch-exploration

sampling-interval=<number>

For user-mode sampling and tracing types: a number (in milliseconds) between 1 and 1000. Default: 10

For hardware event-based sampling types: a number (in milliseconds) between 0.01 and 1000. Default: 1.

Specify a sampling interval (in milliseconds) between CPU samples.

Supported analysis: hotspots,runss, threading, ,runsa, system-overview, memory-access, hpc-performance, runss

sampling-mode=sw | hw

Default: sw

Specify a profiling mode.

Use sw to identify CPU hotspots and explore a call flow of your program. This mode does not require sampling drivers to be installed but incurs more collection overhead.

Use hw to identify application hotspots based on such basic hardware events as Clockticks and Instructions Retired. This is a low-overhead collection mode but it requires the sampling driver to be installed on your system.

Supported analysis: hotspots, threading

signals-mode=off | objects | stack | nostack

Default: off

Enable to view synchronization transitions in the timeline and signalling call stacks for associated waits. The collector instruments signalling APIs, which causes higher overhead and increases result size.

Supported analysis: runss

spdk=true | false

Default: false

Profile SPDK IO API.

Supported analysis: io

stack-size=<number>

A number between 0 and 2147483647. Default is 0 (unlimited stack size).

Reduce the collection overhead and limit the stack size (in bytes) processed by the VTune Amplifier.

Supported analysis: runsa

stack-stitching=true | false

Default: true

For Intel TBB-based applications, restructure the call flow to attach stacks to a point introducing a parallel workload.

Supported analysis: runss

stack-type=software | lbr

Default: software

Choose between software stack and hardware LBR-based stack types. Software stacks have no depth limitations and provide more data while hardware stacks introduce less overhead. Typically, software stack type is recommended unless the collection overhead becomes significant. Note that hardware LBR stack type may not be available on all platforms.

Supported analysis: runsa

stackwalk-mode=online | offline

Default: offline

Choose between online (during collection) and offline (after collection) modes to analyze stacks. Offline mode reduces analysis overhead and is typically recommended.

Supported analysis: runss

waits-mode=off | stack | nostack

Default: off

Enable to identify where threads are waiting or compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size.

Supported analysis: runss

atrace-config=<event>

Available events are gfx, input, view, webview, wm, am, audio, video, camera, hal, res, dalvik.

Collect Android framework events from Systrace*.

Supported analysis: runsa

uncore-sampling-interval=<number>

For hardware event-based sampling types: a number (in milliseconds) between 1 and 1000. Default: 10.

Specify an interval (in milliseconds) between uncore event samples.

Supported analysis: runsa

Actions Modified

collect, collect-with

Description

Use the knob action-option to configure knob settings for a collect (predefined analysis types) or collect-with (custom analysis types) action where the analysis type supports one or more knobs. Each analysis type or collector type supports a specific set of knobs, and each knob requires a value. In most cases the knob value is Boolean, so you would use True to enable the knob.

To see all knobs available for a predefined analysis type:

amplxe-cl -help collect <analysis_type>

To see knobs for a custom analysis type:

amplxe-cl -help collect-with <analysis_type>

Example

This example returns a list of knobs for the Threading analysis type:

amplxe-cl -help collect threading

This example runs a custom event-based sampling data collection on an Android system enabling collection of Android framework and chipset events.

amplxe-cl -collect-with runss -target-system=android -knob sampling-interval=2 -knob cpu-samples-mode=stack -knob ftrace-config=gfx,dalvik -knob chipset-event-config="GMCH_PARTIAL_WR_DRAM.ANY,GMCH_CORE_CLKS" --target-process com.intel.tbb.example.tachyon

This example configures and runs a custom event-based sampling data collection with the stack size limited to 8192 bytes:

amplxe-cl -collect-with runsa -knob enable-stack-collection=true -knob stack-size=8192 -knob event-config=CPU_CLK_UNHALTED.REF_TSC:sa=1800000,CPU_CLK_UNHALTED
Optimization Notice: 

standard

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)