User Guide

Contents

Intel® Threading Building Blocks Code Analysis

Use the
Intel® VTune™
Profiler
for performance analysis of application targets using Intel Threading Building Blocks (Intel TBB).
If you used the Intel® Runtime libraries in your application, you can run:
  • Hotspots and Threading analysis to explore the application parallelization efficiency based on Intel TBB parallel or synchronization constructs.
  • Threading analysis to get detailed information on Intel TBB synchronization objects that limited the parallel performance of your multithreaded application.
Using Intel C++ compiler is recommended to get more comprehensive diagnostics from the
VTune
Profiler
.
Start exploration of Intel TBB parallelization efficiency with Hotspots. Look at the
Effective CPU Utilization Histogram
to see the parallelization level of your application. Note that the histogram reflects the parallelization levels of your application based on the effective time spent subtracting time spent in threading runtimes.
If you see a significant portion of your elapsed time spent with Idle or Poor CPU utilization, explore the
Top Hotspots
table. Flagged Intel TBB functions might mean that the application spends CPU time in the Intel TBB runtime because of parallel inefficiencies like scheduling overhead or imbalance. To discover the reason, hover over the flag.
The
Bottom-up
tab can give you more details about synchronization or overhead in particular Intel TBB constructs. Expand the
Spin Time
and
Overhead Time
columns in the grid to determine why a particular Intel TBB runtime function had a higher than usual execution time. Intel TBB runtime functions are flagged when they consume more than 5% of the CPU time.
For example, an Intel TBB runtime function with a high Scheduling value may indicate that your application has threading work divided into small pieces, which leads to excessive scheduling overhead as the application calls to the runtime. You can resolve this issue by increasing the threading chunk size.
If there is an idle wait time when the Intel TBB runtime does not burn the CPU on synchronization, it is useful to run the Threading analysis to explore synchronization bottlenecks that can prevent effective CPU utilization.
VTune
Profiler
recognizes all types of Intel TBB synchronization objects. If you assign a meaningful name to an object you create in the source code, the
VTune
Profiler
recognizes and represents it in the Result tab. For performance reasons, this functionality is not enabled by default in Intel TBB headers. To make the user-defined objects visible to the
VTune
Profiler
, recompile your application with
TBB_USE_THREADING_TOOLS
set to 1.
To display an overhead introduced by Intel TBB library internals, the
VTune
Profiler
creates a pseudo synchronization object
TBB Scheduler
that includes all waits from the Intel TBB runtime libraries.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804