When the sample application exits, the Intel® VTune™ Amplifier finalizes the results and opens the Locks and Waits viewpoint that is configured to display synchronization objects sorted by Wait time. To interpret the data on the sample code performance, do the following:
Click the Bottom-up tab to open the Bottom-up pane.
The table below explains the type of data provided in the Bottom-up pane:
nqueens_parallel sample code, there are two critical wait objects,
OMP Critical_NQUEENS_ip_SETQUEEN and
OMP Join Barrier_NQUEENS_ip_SOLVE, that caused redundant synchronization and took the longest Wait time and highest Wait count. The bar indicators in the Wait Time column indicate that most of the time for these objects processor cores were underutilized.
Analyze Source Code
Explore the source of the critical synchronization objects that caused significant Wait time and poor processor utilization. Double-click the
NQUEENS_ip_SETQUEEN object to analyze the source of the
NQUEENS_ip_SETQUEEN wait function. Click the button on the Source pane toolbar to go to the biggest hotspot code line in the function. VTune Amplifier highlights line 142 protected by the OpenMP* critical section.
NQUEENS_ip_SETQUEEN function was waiting for 0.979 seconds while this code line was executing. During this time, this operation was contended 4,757 times.
Hover over any transition line in the Timeline pane below to explore the infotip and make sure that all the transitions are caused by the
OMP Critical_NQUEENS_ip_SETQUEEN critical section.
OMP Critical_NQUEENS_ip_SETQUEEN section is the place where the application is serializing. Each thread has to wait for the critical section to be available before it can proceed. Only one thread can be in the critical section at a time.
To explore the next issue, double-click the
OMP Join Barrier_NQUEENS_ip_SOLVE synchronization object to open the source function and go to the hottest line.
OMP Join Barrier_NQUEENS_ip_SOLVE object creates a barrier for threads synchronization: a thread should wait until other threads complete execution. The Timeline pane illustrates the thread imbalance displaying light-green wait regions for each thread. If you hover over a wait region, the infotip shows that the wait happened on the
OMP Join Barrier_NQUEENS_ip_SOLVE synchronization object.
You need to optimize the code to make it more concurrent. Click the Source Editor button on the Source window toolbar to open the code editor and optimize the code.