One common question developers ask is how their parallel workload is distributed or scheduled across the available cores/processors.
Intel® VTuneTM Performance Analyzer comes into help and makes such analysis easy. The event-based sampling (EBS) technology identifies system-wide software performance problems by sampling processor events, such as clockticks and cache misses (Figure 1). From the EBS data, you can determine which process, thread, module, function, and source line in a given application generated particular events. By leveraging this technology you can see how many events were sampled on each core as well as which thread generated them.
The Show/Hide CPU Information button in the sampling toolbar displays collected samples and events per processor in the Process, Thread, Module, and Hotspot sampling views (Figure 2).
We now know that this particular program (sort_mt1.exe) was executed on 2 cores and we can see the number of samples collected on each core. But what we don't know yet is how many threads this application created and how the threads executed on these cores. Selecting the Thread view when CPU button is also selected will show us the desired information. Figure 3 tells us that sort_mt1.exe created 2 threads (thread18 and thread 13) and each thread was executed on both cores (OS scheduled these threads to run on each core) during the analysis. If you look at the clockticks (CPU_CLK_UNHALTED.CORE) for thread18, it becomes clear that this particular thread was executed on each core while running most of the time on Processor 0.
If you are still curious and would like to see how these samples are distributed over time per thread and per core then the sampling over time (SOT) view can help you. By selecting SOT view in thread view (or in any other view) the samples collected will be displayed per thread and/or core(Figure 4). The view seen in Figure 4 is useful for many reasons. The SOT view can help you:
•· see how OS scheduled the threads to run,
•· identify scheduling problems (Figure 5),
•· identify load balancing issues among threads (Figure 6),
•· and correlate micro-architectural problems.
Figure 5: Manually setting thread affinity can create problems. Each thread is scheduled/pinned to the Core/Processor 0.
Figure 6: SOT showing a load imbalance issue.
by Levent Akyil