OPTIMIZE MULTITHREADED PERFORMANCE
In modern multicore systems, threaded performance is critical for exploiting the full potential of the processor. Intel® VTune™ Amplifier helps you tune your software to make effective use of all cores.
Find Common Causes of Slow Threaded Code
The Locks and Waits analysis helps you focus your tuning efforts and envision potential improvements. Use it to identify synchronization objects (locks) that prevent effective processor utilization and to estimate the impact and wait time each lock has on application performance.
See a prioritized list of synchronization objects that negatively impact performance (see Fig. 1).
Tune Parallel Performance
Intel® VTune™ Amplifier has the built-in ability to discern parallel programming models (including OpenMP* 4.0 and Intel® Threading Building Blocks) making it easy to visualize and understand multithreading concepts such as a task beginning and ending, synchronizing, and waiting. Get the data you need to tune performance and see which parallel regions are inefficient and why (for example, imbalance, lock contention, and communication).
Detailed data for each OpenMP region highlights tuning opportunities (see Fig. 2).
Visually Spot Inefficient Threading
Use the timeline to spot patterns of inefficient threading (like coarse-grained locks). Figure 3 shows multiple threads, but only one thread (dark green) runs at a time. No work is done in parallel due to data sharing issues. The timeline lets you visually spot threading inefficiencies. In this example, there are four threads, but only one is running at any given time, so thread concurrency is very low.