Intel® 20 Questions Contest: Question #7
Intel® Parallel Amplifier: Extract project and source files from <Intel Parallel Studio directory>\Composer\Samples\en_US\C++\openmp_samples.zip. Compile project with Intel® C++ Compiler and run the app with Intel Parallel Amplifier. When you run openmp_samples with Intel® Parallel Amplifier, what sync object causes poor CPU utilization by OpenMP threads?
OMP Join Barrier
Intel® Parallel Amplifier has three types of analysis:
- Hotspot: helps you understand where your program is spending time. Hotspot analysis can be useful for performance analysis of both serial and parallel applications.
- Concurrency: helps you understand whether your application efficiently utilizes available cores and it helps you find the most serial code which can be optimized by adding more parallelism.
- Locks and Waits: helps you find synchronization objects (e.g. locks) which cause poor CPU utilization by having threads to wait on them for too long. You should focus your tuning efforts on the sync objects that are on the top of the list.
To find the correct answer to the Question #7, you should run Intel® Parallel Amplifier Locks and Waits analysis. After Intel Parallel Amplifier finished collecting and processing the statistics, it shows the list of sync objects sorted by the CPU time spent on them used in your application. Right next to the CPU time, Intel Parallel Amplifier displays a horizontal bar which tells you how efficiently all the CPU cores were utilized by your application while waiting on the sync object. In addition, Intel Parallel Amplifier displays Wait Count and Sync Object Type for every sync object in the list.
For openmp_samples (see Figure 1), the most wait time and poorest CPU utilization was caused by "OMP Join Barrier".
If you'd like to learn more about basic concepts of Intel® Parallel Amplifier, please read this Getting Started Guide.