You identified that the most time-consuming function is
NQUEENS_ip_SETQUEEN. If you click the Source Editor button from the Source pane, the VTune Amplifier opens the source
nqueens_parallel.f90 file at the hotspot line in the default code editor. You see that the OpenMP* cycle is calling the recursive
setQueen function that initializes the
queens array. To avoid a data race, this array is copied in each thread (see line 127):
Depending on the sample code version, your source line numbers may slightly differ from the numbers provided in this tutorial.
This means that the number of local copies is equal to the number of threads. Since the function is recursive, the array is also copied in every function call, which is unnecessary and creates a big overhead.
To resolve this issue, you may enable OpenMP to make a copy of the array per thread. To do this:
Comment out lines 124 and 127.
Search and replace all
Edit line 159 to add the
This enables the OpenMP run-rime to create a private copy of the array for each thread.
Save the changes made in the source file.
From the Visual Studio Build menu, select Rebuild nqueens_parallel.
The project is rebuilt.
From Visual Studio Debug menu, select Start Without Debugging to run the application.
Visual Studio runs
nqueens_parallel.exe. Note that the execution time has reduced from 127336 ms to 46975 ms. This means that the proposed solution gives 80361 ms of CPU time reduction.
To identify other possible performance issues, you may run the Concurrency analysis and see how effectively your application is parallelized.