Custom Analysis Options

If you create a copy of a predefined analysis type, a new custom configuration inherits all options available for the original analysis and makes them editable.

This is a list of all available custom configuration options (knobs) in the alphabetical order:

A B CDE F G H I J K LM N O P Q R STU V W X Y Z

A

Analyze GPU usage check box (for Linux* targets available with Intel HD Graphics and Intel Iris® Graphics only)

Analyze GPU usage and identify whether your application is GPU or CPU bound.

Analyze I/O waits check box

Analyze the percentage of time each thread and CPU spends in I/O wait state.

Analyze loops check box

Extend loops analysis to collect advanced loops information, such as instructions set usage and display analysis results by loops and functions.

Analyze memory bandwidth check box

Collect events required to compute memory bandwidth.

Analyze memory consumption check box (for Linux targets only)

Collect and analyze information about memory objects with the highest memory consumption.

Analyze memory objects check box (for Linux* targets only)

Enable the instrumentation of memory allocation/de-allocation and map hardware events to memory objects.

Analyze OpenMP regions check box

Instrument the OpenMP* regions in your application to group performance data by regions/work-sharing constructs and detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction, and atomic operations. Using this option may cause higher overhead and increase the result size.

Analyze PCIe bandwidth check box

Collect the events required to compute PCIe bandwidth. As a result, you will be able to analyze the distribution of the read/write operations on the timeline and identify where your application could be stalled due to approaching the bandwidth limits of the PCIe bus.

In the Device class drop-down menu, you can choose a device class where you need to analyze PCIe bandwidth: processing accelerators, mass storage controller, network controller, or all classes of the devices (default).

Note

This analysis is possible only on the Intel microarchitecture code name Sandy Bridge EP and later.

Analyze Processor Graphics hardware events drop-down menu

Analyze performance data from Intel HD Graphics and Intel Iris Graphics (further: Intel Graphics) based on the predefined groups of GPU metrics.

Analyze system-wide context switches check box

Analyze detailed scheduling layout for all threads on the system and identify the nature of context switches for a thread (preemption or synchronization).

Analyze user synchronization check box

Enable User synchronization API profiling to analyze thread synchronization. This option causes higher overhead and increases result size.

Analyze user tasks, events, and counters check box

Analyze tasks, events, and counters specified in your code via the ITT API. This option causes a higher overhead and increases the result size.

C

Chipset events field

Specify a comma-separated list of chipset events (up to 5 events) to monitor with the hardware event-based sampling collector.

Collect context switches check box

Analyze detailed scheduling layout for all threads in your application, explore time spent on a context switch and identify the nature of context switches for a thread (preemption or synchronization).

Note

The types of the context switches (preemption or synchronization) cannot be identified if the analysis uses Perf* based driverless collection.

Collect CPU sampling data menu

Choose whether to collect information about CPU samples and related call stacks.

Collect highly accurate CPU time check box (for Windows targets only)

Obtain more accurate CPU time data. This option causes more runtime overhead and increases result size. Administrator privileges are required.

Collect I/O API data menu

Choose whether to collect information about I/O calls and related call stacks. This analysis option helps identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size.

Collect stacks check box

Enable advanced collection of call stacks and thread context switches to analyze performance, parallelism, and power consumption per execution path.

CPU Events table

  • Specify hardware events to collect using the check boxes in the first column. By default, the table lists all events available for the target platform with events used for the original analysis configuration pre-selected. You may use the Search functionality to find events of interest. To get more details on an event, select it in the table and click the Explain button.

  • Modify the Sample After value for an event to control the number of events after which the VTune Amplifier interrupts the event data collection. The Sample After value depends on the target duration. Based on the duration value, the VTune Amplifier adjusts the Sample After value with a multiplier.

CPU sampling interval, ms field

Specify an interval between collected CPU samples in milliseconds.

Collect signalling API data menu

Choose whether to collect information about synchronization objects and call stacks for signaling calls. This analysis option helps identify synchronization transitions in the timeline and signalling call stacks for associated waits. The collector instruments signalling APIs, which causes higher overhead and increases result size.

Collect synchronization API data menu

Choose whether to collect information about synchronization wait calls and related call stacks. This analysis option helps identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size.

D

Disable alternative stacks for signal handlers check box (available for Linux targets)

Disable using alternative stacks for signal handlers. Consider this option for profiling standard Python 3 code on Linux.

E

Estimate call counts check box

Obtain statistical estimation of call counts based on the hardware events.

Estimate trip counts check box

Obtain statistical estimation of loop trip counts based on the hardware events.

Evaluate max DRAM bandwidth check box

Evaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds.

Event mode drop-down list

Limit event-based sampling collection to USER (user events) or OS(system events) mode. By default, all event types are collected.

G

GPU Profiling mode drop-down menu

Select a profiling mode to identify basic blocks latency due to algorithm inefficiencies, or memory latency due to memory access issues. This option is typically used for GPU in-kernel profiling.

Use the table to specify the kernels of interest and narrow down the GPU in-kernel analysis to specific kernels minimizing the collection overhead. If required, modify the instance step for each kernel, which is a sampling interval (in the number of kernels).

GPU sampling interval, ms field

Specify an interval between GPU samples.

L

Limit PMU collection to counting check box

Enable to collect counts of events instead of default detailed context data for each PMU event (such as code or hardware context). Counting mode introduces less overhead but gives less information.

Linux Ftrace events / Android framework events field

Use the kernel events library to select Linux Ftrace* and Android* framework events to monitor with the collector. The collected data show up as tasks in the Timeline pane. You can also apply the task grouping level to view performance statistics in the grid.

M

Managed runtime type to analyze menu

Choose a type of the managed runtime to analyze. Available options are:

  • for Windows targets: combined Java* and .NET* analysis; combined Java, .NET and Python* analysis; Python only analysis

  • for Linux targets: Java only analysis; combined Java and Python analysis; Python only analysis

Minimal memory object size to track, in bytes spin box (for Linux targets only)

Specify a minimal size of memory allocations to analyze. This option helps reduce runtime overhead of the instrumentation.

P

Profiling mode drop-down menu

Select a profiling mode to identify basic blocks latency due to algorithm inefficiencies, or memory latency due to memory access issues. This option is typically used for GPU in-kernel profiling.

Use the table to specify the kernels of interest and narrow down the GPU in-kernel analysis to specific kernels minimizing the collection overhead. If required, modify the instance step for each kernel, which is a sampling interval (in the number of kernels).

S

Stack size, in bytes field

Specify the size of a raw stack (in bytes) to process. Zero value means unlimited size. Possible values are numbers between 0 and 2147483647.

Stack type

Choose between software stack and hardware LBR-based stack types. Software stacks have no depth limitations and provide more data while hardware stacks introduce less overhead. Typically, software stack type is recommended unless the collection overhead becomes significant. Note that hardware LBR stack type may not be available on all platforms.

Stack unwinding mode menu

Choose whether collection requires online (during collection) or offline (after collection) stack unwinding. Offline mode reduces analysis overhead and is typically recommended.

Stitch stacks check box

For applications using Intel Threading Building Blocks (Intel TBB) or OpenMP* with Intel runtime libraries, restructure the call flow to attach stacks to a point introducing a parallel workload.

T

Trace OpenCL and Intel Media SDK Processor Graphics (Intel Graphics Driver only) check box

Capture the execution time of OpenCL™ kernels and Intel Media SDK programs on a GPU, identify performance-critical GPU tasks, and analyze the performance per GPU hardware metrics.

Note

Intel Media SDK programs analysis is supported for Linux targets only.

U

Uncore sampling interval, ms field

Specify an interval (in milliseconds) between uncore event samples.

Use precise multiplexing check box

Enable a fine-grain event multiplexing mode that switches events groups on each sample. This mode provides more reliable statistics for applications with a short execution time. You can also consider applying the precise multiplexing algorithm if the MUX Reliability metric value for your results is low.

Note

You may generate the command line for this configuration using the Command Line... button at the bottom.

See Also

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)