Celebrating a Decade of Parallel Programming with Intel® Threading Building Blocks (Intel® TBB)

Published:06/10/2016   Last Updated:06/10/2016

This year marks the tenth anniversary of Intel® Threading Building Blocks (Intel® TBB). And it’s a good time to look back over those last 10 years to see where Intel TBB started, how far it’s come, and how successful it’s been in addressing the needs of developers.

The world of processors and computers has changed a great deal in the last 10 years. The last decade of CPU development has been dominated by a big focus on the increasing number of cores. When clock speed improvements stalled, the solution was to introduce more cores so that multiple tasks can be executed simultaneously by individual CPUs, thus increasing performance. This increasing number of cores required a different approach from the traditional clock speed increase. In this case, developers need to change the way their code is written to utilize the multiple cores and see a performance increase.

There were multiple solutions available to help developers address this challenge of efficiently using multiple cores in a processor. Most of these solutions made the task of parallel programming daunting and difficult to implement. Along came Intel TBB, which offered an efficient way of implementing threads using task-based parallelism. Intel TBB has seen a steady stream of innovation since its inception in 2006 and has grown in leaps and bounds. Today, Intel TBB is one of the most popular threading library of choice among C++ parallel programmers spanning multiple industries - Academic, Healthcare, Finance, Oil & Gas, Game development, life sciences, manufacturing, Cloud Service Providers.

Where We Started

Intel TBB started off as a simple C++ template library that could help with parallelizing tasks through loop templates, task scheduler, and memory allocation. It was compatible with other threading packages and compiler agnostic―meaning it could be used with any compiler.

Intel TBB was designed to ensure a high degree of composability for solutions built with it. Composability of parallelized applications or components is the ability to retain the same level of efficiency when running side-by-side with, or inside of, other parallelized components. In particular, the Intel TBB scheduler based on the work stealing algorithm achieves close to optimal hardware utilization, even when the availability of hardware resources keeps changing during the computation. It also seamlessly supports nested parallelism.

How Far Have We Come

Today, Intel TBB has fool-proof load balancing so that programmers do not have to worry about how to distribute a load across a system. Intel TBB allows you to make use of the C++11 lambda functions, thereby enabling you to write functional parallel programming.Intel TBB is not just stopping with multi-threading, but also but also expands to support heterogeneous compute (check out this webinar to learn more) through its Flow Graph feature.. There is also “Python* API for Intel® TBB, which lets you do efficient thread scheduling in Python and accelerate threads when used with Numpy, Scipy, etc. Intel TBB is used under the hood of the Intel® Math Kernel Library (Intel® MKL).  Performance of Intel MKL can be improved by telling Intel TBB to ensure thread affinity to processor cores.

Looking at the Future

Parallelism in commercial, general-purpose computing has returned to keep Moore’s law essentially intact. With the move towards big data, machine learning, and IoT, parallel processing will become more mainstream and essential.  Intel TBB, still in the adolescence of its life, is well poised to take on the challenges posed by these changes in the computing landscape. Persistent and consistent evolution forms the core of Intel TBB―which will continue to grow and meet the changing demands of its customers, providing the same benefits and value to developers as we move forward.

We’re looking forward to celebrating the next 10 years of Intel TBB with you.

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804