Hardware Event-based Sampling Collection

During the hardware event-based sampling (EBS), also known as Performance Monitoring Counter (PMC) analysis in the sampling mode, the Intel® VTune™ Amplifier profiles your application using the counter overflow feature of the Performance Monitoring Unit (PMU).

The data collector interrupts a process and captures the IP of interrupted process at the time of the interrupt. Statistically collected IPs of active processes enable the viewer to show statistically important code regions that affect software performance.

CAUTION

Statistical sampling does not provide 100% accurate data. When the VTune Amplifier collects an event, it attributes not only that event but the entire sampling interval prior to it (often 10,000 to 2,000,000 events) to the current code context. For a big number of samples, this sampling error does not have a serious impact on the accuracy of performance analysis and the final statistical picture is still valid. But if something happened for very little time, then very few samples will exist for it. This may yield seemingly impossible results, such as two million instructions retiring in 0 cycles for a rarely-seen driver. In this case, you may either ignore hotspots showing an insignificant number of samples or switch to a higher granularity (for example, function).

The average overhead of event-based sampling is about 2% on a 1ms sampling interval.

The number of hardware events (Performance Monitoring Counters) that can be collected simultaneously is limited by CPU capabilities. Usually, it is no more than four events. To overcome this limitation, the VTune Amplifier splits the event list into several event groups. Each group consists of events that can be collected simultaneously. VTune Amplifier uses one of the following techniques:

  • Runs an application several times collecting one event group during each run.

  • Runs an application only once and multiplexes the event groups in a round robin fashion during the run. This technique may not work on some OS/hardware combinations.

During product installation on Linux*, you have an option to install the sampling driver with the per-user filtering enabled. When the filtering is on, the collector gathers data only for the processes spawned by the user who started the collection. When it is off (default), samples from all processes on the system are collected. Consider using the filtering to isolate the collection from other users on a cluster for security reasons. The administrator/root can change the filtering mode by rebuilding/restarting the driver at any time. A regular user cannot change the mode after the product is installed.

By default, the VTune Amplifier collector samples your target and does not analyze execution paths. But you can enable the Collect stacks option during analysis configuration to make the collector take exact measurements of any hardware performance events or timestamps, as well as collect a call stack to the point where a thread gets activated and inactivated. On Linux* systems, by default, VTune Amplifier uses the driverless Perf collection mode for the hardware event-based stack analysis.

VTune Amplifier uses the hardware event-based sampling collector to collect data for the following analysis types:

Note

This is a PREVIEW FEATURE. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com or to intelsystemstudio@intel.com.

You can also create a custom analysis type based on the hardware event-based sampling collection.

CAUTION

Analysis types that use the hardware event-based sampling collector are limited to only one collection allowed at a time on a system.

Prerequisites:

It is recommended to install the sampling driver for hardware event-based sampling collection types. For Linux* and Android* targets, if the sampling driver is not installed, VTune Amplifier can enable the Perf* driverless collection. Be aware of the following configuration settings for Linux target systems:

  • To enable system-wide and uncore event collection, use root or sudo to set /proc/sys/kernel/perf_event_paranoid to 0.

    echo 0>/proc/sys/kernel/perf_event_paranoid
  • To enable collection with the Microarchitecture Exploration analysis type, increase the default limit of opened file descriptors. Use root or sudo to increase the default value in /etc/security/limits.conf to 100*<number_of_logical_CPU_cores>.

    <user> hard nofile <100 * number_of_logic_CPU_cores>

    <user> soft nofile <100 * number_of_logic_CPU_cores>

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)