Custom Analysis Options
Analyze GPU usagecheck box (for Linux* targets available with Intel HD Graphics and Intel Iris® Graphics only)
Analyze GPU usage and identify whether your application is GPU or CPU bound.
Analyze I/O waitscheck box
Analyze the percentage of time each thread and CPU spends in I/O wait state.
Analyze interruptscheck box
Collect interrupt events that alter a normal execution flow of a program. Such events can be generated by hardware devices or by CPUs. Use this data to identify slow interrupts that affect your code performance.
Analyze loopscheck box
Analyze memory bandwidthcheck box
Collect events required to compute memory bandwidth.
Analyze memory consumptioncheck box (for Linux targets only)
Analyze memory objectscheck box (for Linux* targets only)
Enable the instrumentation of memory allocation/de-allocation and map hardware events to memory objects.
Analyze OpenMP regionscheck box
Instrument the OpenMP* regions in your application to group performance data by regions/work-sharing constructs and detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction, and atomic operations. Using this option may cause higher overhead and increase the result size.
Analyze PCIe bandwidthcheck box
Collect the events required to compute PCIe bandwidth. As a result, you will be able to analyze the distribution of the read/write operations on the timeline and identify where your application could be stalled due to approaching the bandwidth limits of the PCIe bus.
Device classdrop-down menu, you can choose a device class where you need to analyze PCIe bandwidth: processing accelerators, mass storage controller, network controller, or all classes of the devices (default).
This analysis is possible only on the Intel microarchitecture code name Sandy Bridge EP and later.
Analyze power usagecheck box
Track power consumption by processor over time to see whether it can cause CPU throttling.
Analyze Processor Graphics hardware eventsdrop-down menu
Analyze system-wide context switchescheck box
Analyze detailed scheduling layout for all threads on the system and identify the nature of context switches for a thread (preemption or synchronization).
Analyze user synchronizationcheck box
Analyze user tasks, events, and counterscheck box
Specify a comma-separated list of chipset events (up to 5 events) to monitor with the hardware event-based sampling collector.
Collect context switchescheck box
Analyze detailed scheduling layout for all threads in your application, explore time spent on a context switch and identify the nature of context switches for a thread (preemption or synchronization).
Collect CPU sampling datamenu
Choose whether to collect information about CPU samples and related call stacks.
Collect highly accurate CPU timecheck box (for Windows targets only)
Obtain more accurate CPU time data. This option causes more runtime overhead and increases result size. Administrator privileges are required.
Collect I/O API datamenu
Choose whether to collect information about I/O calls and related call stacks. This analysis option helps identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size.
Collect Parallel File System counterscheck box
Enable collection of the Parallel File System counters to analyze Lustre* file system performance statistics, including Bandwidth, Package Rate, Average Packet Size, and others.
Collect signalling API datamenu
Choose whether to collect information about synchronization objects and call stacks for signaling calls. This analysis option helps identify synchronization transitions in the timeline and signalling call stacks for associated waits. The collector instruments signalling APIs, which causes higher overhead and increases result size.
Collect stackscheck box
Collect synchronization API datamenu
Choose whether to collect information about synchronization wait calls and related call stacks. This analysis option helps identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size.
Collect thread affinitycheck box
Analyze thread pinning to sockets, physical cores, and logical cores. Identify incorrect affinity that utilizes logical cores instead of physical cores and contributes to poor physical CPU utilization.
Affinity information is collected at the end of the thread lifetime, so the resulting data may not show the whole issue for dynamic affinity that is changed during the thread lifetime.
CPU sampling interval, msfield
Disable alternative stacks for signal handlerscheck box (available for Linux targets)
Disable using alternative stacks for signal handlers. Consider this option for profiling standard Python 3 code on Linux.
Enable driverless collectioncheck box
Evaluate max DRAM bandwidthcheck box
Evaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds.
Event modedrop-down list
Limit event-based sampling collection to USER (user events) or OS(system events) mode. By default, all event types are collected.
GPU Profiling modedrop-down menu
Select a profiling mode to either characterize GPU performance issues based on GPU hardware metric presets or enable a source analysis to identify basic blocks latency due to algorithm inefficiencies, or memory latency due to memory access issues.
Computing task of interesttable to specify the kernels of interest and narrow down the GPU analysis to specific kernels minimizing the collection overhead. If required, modify the instance step for each kernel, which is a sampling interval (in the number of kernels).
GPU sampling interval, msfield
Specify an interval between GPU samples.
Limit PMU collection to countingcheck box
Enable to collect counts of events instead of default detailed context data for each PMU event (such as code or hardware context). Counting mode introduces less overhead but gives less information.
Linux Ftrace events/
Android framework eventsfield
Managed runtime type to analyzemenu
Choose a type of the managed runtime to analyze. Available options are:
Minimal memory object size to track, in bytesspin box (for Linux targets only)
Specify a minimal size of memory allocations to analyze. This option helps reduce runtime overhead of the instrumentation.
Profile with Hardware Tracingcheck box
Enable driver-less hardware tracing collection to explore CPU activities of your code at the microsecond level and triage latency issues.
Stack size, in bytesfield
Specify the size of a raw stack (in bytes) to process.
Unlimitedsize value in GUI corresponds to 0 value in the command line. Possible values are numbers between 0 and 2147483647.
Stack typedrop-down menu
Choose between software stack and hardware LBR-based stack types. Software stacks have no depth limitations and provide more data while hardware stacks introduce less overhead. Typically, software stack type is recommended unless the collection overhead becomes significant. Note that hardware LBR stack type may not be available on all platforms.
Stack unwinding modemenu
Choose whether collection requires online (during collection) or offline (after collection) stack unwinding. Offline mode reduces analysis overhead and is typically recommended.
Stitch stackscheck box
Trace GPU Programming APIscheck box
Capture the execution time of OpenCL™ kernels, DPC++ tasks and Intel Media SDK programs on a GPU, identify performance-critical GPU tasks, and analyze the performance per GPU hardware metrics.
Uncore sampling interval, msfield
Specify an interval (in milliseconds) between uncore event samples.
Use precise multiplexingcheck box
Enable a fine-grain event multiplexing mode that switches events groups on each sample. This mode provides more reliable statistics for applications with a short execution time. You can also consider applying the precise multiplexing algorithm if the MUX Reliability metric value for your results is low.