User-Mode Sampling and Tracing Collection

When profiling application execution, theIntel® VTune™ Amplifier takes snapshots of how that application utilizes the processors in the system. A thread is considered active at a specific moment if it is ready to execute or is executing (not blocking). The snapshots of the number of running threads at the moment provide a hint to the degree of parallelism of the application as well as how this application utilizes processor resources. VTune Amplifier classifies utilization into the ranges: Idle, Poor, Ok, and Ideal.

The user-mode sampling and tracing collector interrupts a process, collects the value of all active instruction addresses and captures a calling sequence for each of these samples. Sampled instruction pointers along with their calling sequences (stacks) are stored in data collection files. Statistically collected IP samples with calling sequences enable the viewer to display a call graph or/and the most time-consuming paths. Use this data to understand the control flow for statistically important code sections.

Average overhead of the user-mode sampling and tracing collector is about 5% when sampling is using the default interval of 10ms.

VTune Amplifier uses the user-mode sampling and tracing collector to collect data for the following analysis types:

  • Hotspots

  • Concurrency

  • Locks and Waits

You can also create a custom analysis type based on the user-mode sampling and tracing collection.

Collecting Stack Data

When collecting data, the VTune Amplifier analyzes no more than one stack per configured interval. It unwinds stacks each 10 milliseconds of thread execution. But the VTune Amplifier may decide to skip or emulate stack unwinding for performance reasons. In this case, when processing the collected data during finalization, the VTune Amplifier tries to find matching stacks in the history for events without stacks.

This approach reduces stack unwinding overhead but may provide incorrect stacks due to wrong matches. In such cases, the VTune Amplifier displays pseudo nodes in the bottom-up/top-down trees marked as [Guessed frame(s)], and [Skipped frame(s)]. See Troubleshooting to learn how to overcome these problems.

VTune Amplifier may also display [Unknown frame(s)] nodes if it could not locate symbol files for system or application modules when unwinding the stack. See Resolving Unknown Frame(s) for more details.

When analyzing applications that use recursive algorithms with long and unique call chains, the VTune Amplifier collapses recursive chains into a single node. It preserves one level of recursion so high-level properties of the recursive solution remain available. This approach helps save space and minimize an impact on the database performance.

For more complete information about compiler optimizations, see our Optimization Notice.