Use the Intel® VTune™ Amplifier for performance analysis of application targets using Intel Threading Building Blocks (Intel TBB).
If you used the Intel® Runtime libraries in your application, you can run:
Hotspots and Threading analysis to explore the application parallelization efficiency based on Intel TBB parallel or synchronization constructs.
Threading analysis to get detailed information on Intel TBB synchronization objects that limited the parallel performance of your multithreaded application.
Using Intel C++ compiler is recommended to get more comprehensive diagnostics from the VTune Amplifier.
Start exploration of Intel TBB parallelization efficiency with Hotspots. Look at the Effective CPU Utilization Histogram to see the parallelization level of your application. Note that the histogram reflects the parallelization levels of your application based on the effective time spent subtracting time spent in threading runtimes.
If you see a significant portion of your elapsed time spent with Idle or Poor CPU utilization, explore the Top Hotspots table. Flagged Intel TBB functions might mean that the application spends CPU time in the Intel TBB runtime because of parallel inefficiencies like scheduling overhead or imbalance. To discover the reason, hover over the flag.
The Bottom-up tab can give you more details about synchronization or overhead in particular Intel TBB constructs. Expand the Spin Time and Overhead Time columns in the grid to determine why a particular Intel TBB runtime function had a higher than usual execution time. Intel TBB runtime functions are flagged when they consume more than 5% of the CPU time.
For example, an Intel TBB runtime function with a high Scheduling value may indicate that your application has threading work divided into small pieces, which leads to excessive scheduling overhead as the application calls to the runtime. You can resolve this issue by increasing the threading chunk size.
If there is an idle wait time when the Intel TBB runtime does not burn the CPU on synchronization, it is useful to run the Threading analysis to explore synchronization bottlenecks that can prevent effective CPU utilization. VTune Amplifier recognizes all types of Intel TBB synchronization objects. If you assign a meaningful name to an object you create in the source code, the VTune Amplifier recognizes and represents it in the Result tab. For performance reasons, this functionality is not enabled by default in Intel TBB headers. To make the user-defined objects visible to the VTune Amplifier, recompile your application with TBB_USE_THREADING_TOOLS set to 1.
To display an overhead introduced by Intel TBB library internals, the VTune Amplifier creates a pseudo synchronization object TBB Scheduler that includes all waits from the Intel TBB runtime libraries.