Hardware Event-based Sampling Collection

The hardware event-based sampling collector of the Intel® VTune™ Amplifier profiles your application using the counter overflow feature of the Performance Monitoring Unit (PMU). The data collector interrupts a process and captures the IP of interrupted process at the time of the interrupt. Statistically collected IPs of active processes enable the viewer to show statistically important code regions that affect software performance.

Caution

Statistical sampling does not provide 100% accurate data. When the VTune Amplifier collects an event, it attributes not only that event but the entire sampling interval prior to it (often 10,000 to 2,000,000 events) to the current code context. For a big number of samples, this sampling error does not have a serious impact on the accuracy of performance analysis and the final statistical picture is still valid. But if something happened for very little time, then very few samples will exist for it. This may yield seemingly impossible results, such as two million instructions retiring in 0 cycles for a rarely-seen driver. In this case, you may either ignore hotspots showing an insignificant number of samples or switch to a higher granularity (for example, function).

The average overhead of event-based sampling is about 2% on a 1ms sampling interval.

The number of hardware events that can be collected simultaneously is limited by CPU capabilities. Usually, it is no more than four events. To overcome this limitation, the VTune Amplifier splits the event list into several event groups. Each group consists of events that can be collected simultaneously. VTune Amplifier uses one of the following techniques:

  • Runs an application several times collecting one event group during each run.

  • Runs an application only once and multiplexes the event groups in a round robin fashion during the run. This technique may not work on some OS/hardware combinations.

By default, the VTune Amplifier collector samples your target and does not analyze execution paths. But you can enable the Collect stacks option during analysis configuration to make the collector take exact measurements of any hardware performance events or timestamps, as well as collect a call stack to the point where a thread gets activated and inactivated.

VTune Amplifier uses the hardware event-based sampling collector to collect data for the following analysis types:

  • Advanced Hotspots

  • Intel Core 2 Processor Analysis - analysis types targeted for Intel® Core™ 2 processor family

  • Nehalem / Westmere Analysis targeted for Intel microarchitectures code name Nehalem and Westmere

  • Sandy Bridge Analysis targeted for Intel microarchitecture code name Sandy Bridge

  • Knights Corner Platform Analysis targeted for Intel Xeon Phi™ coprocessors (code name: Knights Corner)

You can also create a custom analysis type based on the hardware event-based sampling collection.

Caution

Analysis types that use the hardware event-based sampling collector are limited to only one collection allowed at a time on a system.

For more complete information about compiler optimizations, see our Optimization Notice.