Intel® Parallel Amplifier provides three different analysis methods and the concurrency analysis is one of them. Concurrency analysis in summary measures how the application utilizes the available processors on a given system. Concurrency analysis helps the user identify hotspot functions where processor utilization is poor or not ideal.
The easiest way to find the load imbalance issues is to use the Thread-Function-Call Stackgrouping as the granularity in the grid view after performing a concurrent analysis. In this view, CPU Time by utilization is grouped by the threads.
In the example below, OMP Worker Thread is running for 9.335 seconds while main thread is only running for 7.752 seconds. The time contributing to the serial time and to the load imbalance is coming from the OMP Worker Thread. Each thread can be expanded further to see which functions these threads execute and how they contribute to the concurrency.
After fixing the load imbalance issue, the analysis result looks like the following:
By Levent Akyil