When the sample application exits, the Intel® VTune™ Amplifier finalizes the results and opens the Locks and Waits viewpoint that is configured to display synchronization objects sorted by Wait time. To interpret the data on the sample code performance, do the following:
Click the Bottom-up tab to open the Bottom-up pane.
The table below explains the type of data provided in the Bottom-up pane:
nqueens_parallel sample code, there are two critical wait objects,
OMP Critical_NQUEENS_ip_SETQUEEN and
OMP Join Barrier_NQUEENS_ip_SOLVE, that caused redundant synchronization and took the longest Wait time and highest Wait count. The bar indicators in the Wait Time column indicate that most of the time for these objects processor cores were overutilized.
Analyze Source Code
Explore the source of the critical synchronization objects that caused significant Wait time and poor processor utilization. Double-click the
NQUEENS_ip_SETQUEEN object to analyze the source of the
NQUEENS_ip_SETQUEEN wait function. Click the button on the Source pane toolbar to go to the biggest hotspot code line in the function. VTune Amplifier highlights line 142 protected by the OpenMP* critical section.
NQUEENS_ip_SETQUEEN function was waiting for 31.991 seconds while this code line was executing. During this time, this operation was contended 48,345 times.
Hover over any transition line in the Timeline pane below to explore the infotip and make sure that all the transitions are caused by the
OMP Critical_NQUEENS_ip_SETQUEEN critical section.
OMP Critical_NQUEENS_ip_SETQUEEN section is the place where the application is serializing. Each thread has to wait for the critical section to be available before it can proceed. Only one thread can be in the critical section at a time.