Interpret Concurrency Results

When the sample application exits, the Intel® VTune™ Amplifier finalizes the results and opens the Hotspots by CPU Usage viewpoint where each window or pane is configured to display data on application parallelism and usage of processor cores. To interpret the data on the sample code performance, do the following:

  1. Explore application-level concurrency

  2. Identify the most time-consuming function.

Note

The screenshots and execution time data provided in this tutorial are created on a system with 4 CPU cores. Your data may vary depending on the number and type of CPU cores on your system.

Explore Application-level Concurrency

Start analysis with the Summary window that opens by default when data collection completes. To interpret the data, hover over the question mark icons to read the pop-up help and better understand what each performance metric means.

You see that after optimization the Elapsed time has reduced from 32.440 seconds to 18.351 seconds.

Note

The Concurrency analysis adds an overhead to the application execution. The overhead often depends on the number of threads and synchronization objects used in the application. This is the reason why Elapsed time data provided in the Summary window may differ from the data reported after the application launch outside of the VTune Amplifier.

The CPU Usage Histogram shows that the average concurrency level of the sample application is about 3.2 while the target concurrency level for this application on the 4-core system is 4. If you hover over the highest bar, you see that this application has run 4 threads for almost 11 seconds, which is categorized by the VTune Amplifier as Ideal processor utilization. The application has run one and two threads simultaneously for more than 4 seconds, which is classified as Poor parallelization.

Identify the Most Time-consuming Function

Click the Bottom-up tab to switch to the Bottom-up window and analyze application performance by function. By default, the grid is sorted by the CPU Time by Utilization metric in the descending order. Select the Process/Function/Thread/Call Stack grouping level from the Grouping menu. This granularity enables you to visualize threads where the hotspots functions were executed.

After initial optimization, the NQUEENS_ip_SETQUEEN function is still a bottleneck. Click the plus sign at the NQUEENS_ip_SETQUEEN function. You see that this function's execution was parallelized among four threads.

Select these threads in the grid, right-click and choose the Filter In by Selection context menu option. The Timeline pane below is updated to display data for the selected threads only.

1

Timeline area. When you hover over the graph element, the timeline tooltip displays the time passed since the application has been launched.

2

Threads area that shows the distribution of CPU time utilization per thread. Hover over a bar to see the CPU time utilization in percent for this thread at each moment of time. Dark green zones show the time threads are active. Light-green zones show the time threads were waiting.

3

Transitions. The execution flow between threads where one thread signals to another thread waiting to receive that signal. You may zoom in to a time region to get more detailed view of the transitions. To do this, drag and drop to select the region and right-click to select the Zoom In on Selection option from the context menu.

The Timeline pane for the sample application shows a large number of transitions between threads, which means that the threads spent noticeable time transferring execution to each other. If you uncheck the Transitions display option on the right, you see that workload balance is also poor since three of four threads were waiting for OMP Worker Thread #1 to complete execution.

Run the Locks and Waits analysis to understand what prevents the sample code from effective thread concurrency and processor utilization.

Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.