Intel® Threading Building Blocks Release Notes and New Features

By Jennifer Dimatteo,

Published:08/14/2017   Last Updated:07/16/2020

This page provides the current Release Notes for Threading Building Blocks (TBB). The notes are categorized by year, from newest to oldest, with individual releases listed within each year.

Click a version to expand it into a summary of new features and changes in that version since the last release, and access the download buttons for the detailed release notes, which include important information, such as pre-requisites, software compatibility, installation instructions, and known issues.

You can copy a link to a specific version's section by clicking the chain icon next to its name.

To get product updates, log in to the Intel® Software Development Products Registration Center.
For questions or technical support, visit Intel® Software Developer Support.

2020

Update 3

Release Notes

What's New in this release:

Changes affecting backward compatibility:

  • Changed body type concept of the flow::input_node. Set TBB_DEPRECATED_INPUT_NODE_BODY to 1 to compile with the previous concept of the body type.

Bugs fixed:

Open-source contributions integrated:

  • Fixed an issue in TBBBuild.cmake that causes the build with no arguments to fail (https://github.com/oneapi-src/oneTBB/pull/233) by tttapa.
  • Added cmake/{TBBConfig,TBBConfigVersion}.cmake to Git ignore list (https://github.com/oneapi-src/oneTBB/pull/239) by Eisuke Kawashima.

Downloads:

  • TBB 2020 U3 is available as a part of Intel(R) Parallel Studio XE 2020 Update 2.
  • In addition, you can download the latest TBB open source version from https://github.com/oneapi-src/oneTBB.

Update 2

Release Notes

What’s New in this release:

  • Cross-allocator copying constructor and copy assignment operator for concurrent_vector are deprecated.
  • Added input_node to the flow graph API. It acts like a source_node except for being inactive by default; source_node is deprecated.
  • Allocator template parameter for flow graph nodes is deprecated. Set TBB_DEPRECATED_FLOW_NODE_ALLOCATOR to 1 to avoid compilation errors.
  • Flow graph preview hetero-features are deprecated.

Bugs fixed:

  • Fixed the task affinity mechanism to prevent unlimited memory consumption in case the number of threads is explicitly decreased.
  • Fixed memory leak related NUMA support functionality in task_arena.

Downloads:

  • TBB 2020 U2 is available as a part of Intel(R) Parallel Studio XE 2020 Update 1.
  • In addition, you can download the latest TBB open source version from https://github.com/oneapi-src/oneTBB.

Update 1

Release Notes

What’s New in this release:

Preview features:

  • The NUMA support library (tbbbind) no more depends on the main TBB library.

Bugs fixed:

Downloads:

Initial Release

Release Notes

What’s New in this release:

Deprecation:

  • Multiple APIs are deprecated. For details, please see Deprecated Features appendix in the TBB reference manual.

New functionality:

  • Added warning notifications when the deprecated functionality is used.
  • Added C++17 deduction guides for flow graph nodes.

Preview Features:

  • The option to create a task_arena tied to a specific NUMA node simplifies development of scalable NUMA-aware applications.
  • Added a possibility to suspend task execution at a specific point and resume it later thus reducing the code complexity when integrating I/O threads in compute intensive applications.
  • Extended the flow graph API to simplify connecting nodes.
  • Added isolated_task_group class that allows multiple threads to add and execute tasks sharing the same isolation.
  • Added erase() by heterogeneous keys for concurrent ordered containers.

Bugs fixed:

  • Fixed the emplace() method of concurrent unordered containers to destroy a temporary element that was not inserted.
  • Fixed a bug in the merge() method of concurrent unordered containers.
  • Fixed behavior of a continue_node that follows buffering nodes.
  • Fixed compilation error caused by missed stdlib.h when CMake integration is used (https://github.com/intel/tbb/issues/195). Inspired by Andrew Penkrat.

Open-source contributions integrated:

2019

Update 9

Release Notes

What’s New in this release:

  • Improved async_node to never block a thread that sends a message through its gateway.
  • Added support of Windows* to the CMake module TBBInstallConfig.

Preview Features:

  • Added ordered associative containers: concurrent_{map,multimap,set,multiset} (requires C++11).

Update 8

Release Notes

Bug fixed:

Downloads

Update 7

Release Notes

What’s New in this release:

  • Added TBBMALLOC_SET_HUGE_SIZE_THRESHOLD parameter to set the lower bound for allocations that are not released back to OS unless a cleanup is explicitly requested.
  • Added zip_iterator::base() method to get the tuple of underlying iterators.
  • Improved async_node to never block a thread that sends a message through its gateway.
  • Extended decrement port of the tbb::flow::limiter_node to accept messages of integral types.
  • Added support of Windows* to the CMake module TBBInstallConfig.
  • Added packaging of CMake configuration files to TBB packages built using build/build.py script ( https://github.com/intel/tbb/issues/141).

Changes affecting backward compatibility:

  • Removed the number_of_decrement_predecessors parameter from the constructor of flow::limiter_node. To allow its usage, set TBB_DEPRECATED_LIMITER_NODE_CONSTRUCTOR macro to 1.

Preview Features:

  • Added ordered associative containers: concurrent_{map,multimap,set,multiset} (requires C++11).

Open-source contributions integrated:

Downloads

Update 6

Release Notes

What’s New in this release:

  • Added support for Microsoft* Visual Studio* 2019.
  • Added support for enqueuing tbb::task into tbb::task_arena ( https://github.com/01org/tbb/issues/116).
  • Improved support for allocator propagation on concurrent_hash_map assigning and swapping.
  • Improved scalable_allocation_command cleanup operations to release more memory buffered by the calling thread.
  • Separated allocation of small and large objects into distinct memory regions, which helps to reduce excessive memory caching inside the TBB allocator.

Preview Features:

  • Removed template class gfx_factory from the flow graph API.

Downloads

  • TBB 2019 U6 is available as a part of Intel(R) Parallel Studio XE 2019 Update 4.
  • In addition, you can download the latest TBB open source version from https://github.com/01org/tbb/releases.

Update 5

Release Notes

What's New in this release:

  • Associating a task_scheduler_observer with an implicit or explicit task arena is now a fully supported feature.
  • Added a CMake module TBBInstallConfig that allows to generate and install CMake configuration files for TBB packages. Inspired by Hans Johnson (https://github.com/01org/tbb/pull/119).
  • Added node handles, methods merge() and unsafe_extract() to concurrent unordered containers.
  • Added constructors with Compare argument to concurrent_priority_queue (https://github.com/01org/tbb/issues/109).
  • Controlling the stack size of worker threads is now supported for Universal Windows Platform.
  • Improved tbb::zip_iterator to work with algorithms that swap values via iterators.
  • Improved support for user-specified allocators in concurrent_hash_map, including construction of allocator-aware data types.
  • For ReaderWriterMutex types, upgrades and downgrades now succeed if the mutex is already in the requested state. Inspired by Niadb (https://github.com/01org/tbb/pull/122).

Preview Features:

  • The task_scheduler_observer::may_sleep() method has been removed.

Bugs fixed:

  • Fixed the issue with a pipeline parallel filter executing serially if it follows a thread-bound filter.
  • Fixed a performance regression observed when multiple parallel algorithms start simultaneously.

Downloads

Update 4

Release Notes

What's New in this release:

  • global_control class is now a fully supported feature.
  • Added deduction guides for tbb containers: concurrent_hash_map, concurrent_unordered_map, concurrent_unordered_set.
  • Added tbb::scalable_memory_resource function returning std::pmr::memory_resource interface to the TBB memory allocator.
  • Added tbb::cache_aligned_resource class that implements std::pmr::memory_resource with cache alignment and no false sharing.
  • Added rml::pool_msize function returning the usable size of a memory block allocated from a given memory pool.
  • Added default and copy constructors for tbb::counting_iterator and tbb::zip_iterator.
  • Added TBB_malloc_replacement_log function to obtain the status of dynamic memory allocation replacement (Windows* only).
  • CMake configuration file now supports release-only and debug-only configurations (https://github.com/01org/tbb/issues/113).
  • TBBBuild CMake module takes the C++ version from CMAKE_CXX_STANDARD.

Bugs fixed:

  • Fixed compilation for tbb::concurrent_vector when used with std::pmr::polymorphic_allocator.

Open-source contributions integrated:

Downloads

  • Intel TBB 2019 U4 is available as a part of Intel(R) Parallel Studio XE 2019 Update 3.
  • In addition, you can download the latest Intel TBB open source version from https://github.com/01org/tbb/releases.

Update 3

Release Notes

What's New in this release:

  • Added tbb::transform_iterator.
  • Added new Makefile target 'profile' to flow graph examples enabling additional support for Intel® Parallel Studio XE tools.
  • Added TBB_MALLOC_DISABLE_REPLACEMENT environment variable to switch off dynamic memory allocation replacement on Windows*. Inspired by a contribution from Edward Lam.

Preview Features:

  • Extended flow graph API to support relative priorities for functional nodes, specified as an optional parameter to the node constructors.

Open-source contributions integrated:

Downloads

Update 2

Release Notes

What’s New in this release::

  • Threading Building Blocks 2019 Update 2 includes functional and security updates. Users should update to the latest version.
  • Added constructors with HashCompare argument to concurrent_hash_map (https://github.com/01org/tbb/pull/63).
  • Added overloads for parallel_reduce with default partitioner and user-supplied context.
  • Added deduction guides for tbb containers: concurrent_vector, concurrent_queue, concurrent_bounded_queue, concurrent_priority_queue.
  • Reallocation of memory objects >1MB now copies and frees memory if the size is decreased twice or more, trading performance off for reduced memory usage.
  • After a period of sleep, TBB worker threads now prefer returning to their last used task arena.

Bugs fixed:

Update 1

Release Notes

What's New in this release:

  • Doxygen documentation could be built with 'make doxygen' command now.

Changes affecting backward compatibility:

  • Enforced 8 byte alignment for tbb::atomic and tbb::atomic. On IA-32 architecture it may cause layout changes in structures that use these types.

Bugs fixed:

  • Fixed an issue with dynamic memory allocation replacement on Windows* occurred for some versions of ucrtbase.dll.
  • Fixed possible deadlock in tbbmalloc cleanup procedure during process shutdown.
  • Fixed usage of std::uncaught_exception() deprecated in C++17(https://github.com/01org/tbb/issues/67).
  • Fixed a crash when a local observer is activated after an arena observer.
  • Fixed compilation of task_group.h by Visual C++* 15.7 with /permissive- option (https://github.com/01org/tbb/issues/53).
  • Fixed tbb4py to avoid dependency on Intel(R) C++ Compiler shared libraries.
  • Fixed compilation for Anaconda environment with GCC 7.3 and higher.

Downloads

Initial Release

Release Notes

One of the best known C++ threading libraries Threading Building Blocks (TBB) was recently updated to a new release 2019. The updated version contains several key new features when compared to the previous 2018 Update 5 release.

What's New in this release:

  • Lightweight policy for functional nodes in the flow graph is now a fully supported feature.
  • Reservation support in flow::write_once_node and flow::overwrite_node is now a fully supported feature.
  • Support for Flow Graph Analyzer and improvements for Intel(R) VTune(TM) Amplifier become a regular feature enabled by TBB_USE_THREADING_TOOLS macro.
  • Added support for std::new_handler in the replacement functions for global operator new.
  • Added C++14 constructors to concurrent unordered containers.
  • Added tbb::counting_iterator and tbb::zip_iterator.
  • Fixed multiple -Wextra warnings in TBB source files.

Preview Features:

  • Extracting nodes from a flow graph is deprecated and disabled by default. To enable, use TBB_DEPRECATED_FLOW_NODE_EXTRACTION macro.

Changes affecting backward compatibility:

  • Due to internal changes in the flow graph classes, recompilation is recommended for all binaries that use the flow graph.

Open-source contributions integrated:

  • Added support for OpenBSD by Anthony J. Bentley.

2018

Update 6

Release Notes

What’s New in this release:

Bugs fixed:

  • Fixed an issue with dynamic memory allocation replacement on Windows* occurred for some versions of ucrtbase.dll.

Update 5

Release Notes

Changes (w.r.t. Intel TBB 2018 Update 4):

Preview Features:

  • Added user event tracing API for Intel(R) VTune(TM) Amplifier and Flow Graph Analyzer.

Bugs fixed:

Open-source contributions integrated:

Downloads

Update 4

Release Notes

Changes (w.r.t. Intel TBB 2018 Update 3):

Preview Features:

  • Improved support for Flow Graph Analyzer and Intel(R) VTune(TM) Amplifier in the task scheduler and generic parallel algorithms.
  • Default device set for opencl_node now includes all the devices from the first available OpenCL* platform.
  • Added lightweight policy for functional nodes in the flow graph. It indicates that the node body has little work and should, if possible be executed immediately upon receiving a message, avoiding task scheduling overhead.

Update 3

Release Notes

Changes (w.r.t. Intel TBB 2018 Update 2):

Preview Features:

  • Added template class blocked_rangeNd for a generic multi-dimensional range (requires C++11). Inspired by a contribution from Jeff Hammond.

Bugs fixed:

  • Fixed a crash with dynamic memory allocation replacement on Windows* for applications using system() function.
  • Fixed parallel_deterministic_reduce to split range correctly when used with static_partitioner.
  • Fixed a synchronization issue in task_group::run_and_wait() which caused a simultaneous call to task_group::wait() to return prematurely.

Downloads

Update 2

Release Notes

Changes (w.r.t. Intel TBB 2018 Update 1):

  • Added support for Android* NDK r16, macOS* 10.13, Fedora* 26.
  • Binaries for Universal Windows Driver (vc14_uwd) now link with static Microsoft* runtime libraries, and are only available in commercial releases.
  • Extended flow graph documentation with more code samples.

Preview Features:

  • Added a Python* module for multi-processing computations in numeric Python* libraries.

Bugs fixed:

  • Fixed constructors of concurrent_hash_map to be exception-safe.
  • Fixed auto-initialization in the main thread to be cleaned up at shutdown.
  • Fixed a crash when tbbmalloc_proxy is used together with dbghelp.
  • Fixed static_partitioner to assign tasks properly in case of nested parallelism.

Update 1

Release Notes

The updated version (Open Source release only) contains these additions:

  • lambda-friendly overloads for parallel_scan.
  • support of static and simple partitioners in parallel_deterministic_reduce.

We also introduced a few preview features:

  • initial support for Flow Graph Analyzer to do parallel_for.
  • reservation support in overwrite_node and write_once_node.

Bugs fixed

  • Fixed a potential deadlock scenario in the flow graph that affected Intel® TBB 2018 Initial Release.

Initial Release

Release Notes

One of the best known C++ threading libraries Intel® Threading Building Blocks (Intel® TBB) was recently updated to a new release 2018. The updated version contains several key new features when compared to the previous 2017 Update 7 release (https://software.intel.com/en-us/articles/whats-new-intel-threading-building-blocks-2017-update-7).

Licensing

Intel® TBB outbound license for commercial support is Intel Simplified Software License: https://software.intel.com/en-us/license/intel-simplified-software-license. The license for open source distribution has not changed.

Tasks

Intel® TBB is now fully supports this_task_arena::isolate() function. Also, this_task_arena::isolate() function and task_arena::execute() methods were extended to pass on the value returned by the executed functor (this feature requires C++11). The task_arena::enqueue() and task_group::run() methods extended to accept move-only functors.

Flow Graph

A flow graph now spawns all tasks into the same task arena and waiting for graph completion also happens in that arena.

There are some changes affecting backward compatibility:

  • Internal layout changes in some flow graph classes
  • Several undocumented methods are removed from class graph, including set_active() and is_active().
  • Due to incompatible changes, the namespace version is updated for the flow graph; recompilation is recommended for all binaries that use the flow graph classes.

We also introduced a few preview features:

  • opencl_node can be used with any graph object; class opencl_graph is removed.
  • graph::wait_for_all() now automatically waits for all not yet consumed async_msg objects.

Flow Graph Analyzer (FGA) is available as a technology preview in Intel® Parallel Studio XE 2018 and as a feature of Intel® Advisor /content/www/us/en/develop/articles/getting-started-with-flow-graph-analyzer.html.The support for FGA tool in async_node, opencl_node and composite_node has been improved.

Introduction of Parallel STL

Parallel STL, an implementation of the C++ standard library algorithms with support for execution policies, has been introduced. Parallel STL relies on Intel® TBB underneath. For more information, see Getting Started with Parallel STL (https://software.intel.com/en-us/get-started-with-pstl).

Additional support for Android*, UWP, macOS
  • Added support for Android* NDK r15, r15b.
  • Added support for Universal Windows Platform.
  • Increased minimally supported version of macOS* (MACOSX_DEPLOYMENT_TARGET) to 10.11.
Bugs fixed
  • Fixed a bug preventing use of streaming_node and opencl_node with Clang; inspired by a contribution from Francisco Facioni.
  • Fixed this_task_arena::isolate() function to work correctly with parallel_invoke and parallel_do algorithms.
  • Fixed a memory leak in composite_node.
  • Fixed an assertion failure in debug tbbmalloc binaries when TBBMALLOC_CLEAN_ALL_BUFFERS is used.
Downloads

You can download the latest Intel® TBB version from http://threadingbuildingblocks.org and https://software.intel.com/en-us/intel-tbb.

In addition, Intel® TBB ca be installed using:

Improved insights in Intel® VTune™ Amplifier 2018

Intel® VTune™ Amplifier 2018 (https://software.intel.com/en-us/vtune-amplifier-help) improved insight into parallelism inefficiencies for applications using Intel® Threading Building Blocks (Intel® TBB) with extended classification of high Overhead and Spin time: /content/www/us/en/develop/articles/overhead-and-spin-time-issue-in-intel-threading-building-blocks-applications-due-to.html

Cmake support

Cmake support in Intel® TBB (https://github.com/01org/tbb/tree/tbb_2018/cmake) has been introduced as well.

Samples

All examples for the commercial version of the library were moved online: https://software.intel.com/en-us/product-code-samples. Examples are available as a standalone package or as a part of Intel® Parallel Studio XE or Intel® System Studio Online Samples packages

Documentation

The following documentation for Intel® TBB is available:

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804