When the sample application exits, the Intel® VTune™ Amplifier finalizes the results and opens the Locks and Waits viewpoint where each window or pane is configured to display CPU time utilization of the synchronization objects during a wait. To interpret the data on the sample code performance, do the following:
The screen shots and execution time data provided in this tutorial are created on a system with 12 CPU cores. Your data may vary depending on the number and type of CPU cores on your system.
Analyze the Basic Locks and Waits Metrics
Start with exploring the data provided in the Summary window for the whole application performance. To interpret the data, hover over the question mark icons to read the pop-up help and better understand what each performance metric means.
2) Wait Time occurs when software threads are waiting due to APIs that block or cause synchronization. Wait Time is calculated per thread, so the total Wait time may exceed the application Elapsed time. Expand the Wait Time metric to view a distribution per processor utilization levels. In the sample application, most of the Wait time is characterized with an ineffective processor usage;
5) CPU Time is the sum of CPU time for all threads;
The Thread Concurrency Histogram represents the Elapsed time and concurrency level for the specified number of running threads. Ideally, the highest bar of your chart should be within the Ok or Ideal utilization range.
For the sample code, the chart shows that
analyze_locks is a multithreaded application running maximum 12 threads simultaneously on a machine with 12 cores. But it is not using available cores effectively.
analyze_locks application was either idle or ran on one logical CPU. If you hover over the second bar, you see that it spent 3.547 seconds using one core only, which is classified by the VTune Amplifier as a Poor utilization. To understand what prevented the application from using all available logical CPUs effectively, explore the Bottom-up pane.
For the analyzed sample code, you see that the second object caused the longest Wait Time with Poor thread concurrency . The red bar in the Wait Time by Thread Concurrency column indicates that most of the time for this object processor cores were underutilized. It is a Critical Section that shows much serial time and is causing a wait. Click the arrow sign at the object name to expand the node and see the
draw_task wait function that contains this critical section and call stack. Double-click this wait function to see the source code.