If you used the Intel® Runtime libraries for the development, you can run:
Basic Hotspots, Advanced Hotspots, and Concurrency analysis to identify the impact of Intel Threading Building Blocks (Intel TBB) function calls to your application performance;
Locks and Waits analysis to get detailed information on Intel TBB synchronization objects that limited the parallel performance of your multithreaded application. Intel® VTune™ Amplifier helps locate areas that show large amounts of parallelization overhead, indicating inefficient parallelization.
Using Intel C++ compiler is recommended to get more comprehensive diagnostics from the VTune Amplifier.
VTune Amplifier recognizes all types of Intel TBB synchronization objects. If you assign a meaningful name to an object you create in the source code, the VTune Amplifier recognizes and represents it in the Result tab. For performance reasons, this functionality is not enabled by default in Intel TBB headers. To make the user-defined objects visible to the VTune Amplifier, recompile your application with TBB_USE_THREADING_TOOLS set to 1.
To display an overhead introduced by Intel TBB library internals, the VTune Amplifier creates a pseudo synchronization object TBB scheduler that includes all waits from the Intel TBB runtime libraries.