What's New? Intel® Threading Building Blocks 4.3

One of the best known C++ threading libraries Intel® Threading Building Blocks (Intel® TBB) was recently updated to a new release 4.3. The updated version contains several key new features comparing to previous release 4.2. Some of them were already released in Intel TBB 4.2 updates.

The following features are now fully supported: flow::indexer_node, task_arena, speculative_spin_rw_mutex.

Tasks arenas

Class tbb::task_arena is now fully supported. It provides improved control over workload isolation and the degree of concurrency. You can limit concurrency level for each arena (and algorithms running inside them). Tasks are not shared between arenas allowing workload and resource isolation.

The following example runs two parallel_for loops concurrently; one that is scalable and one that is not. The non-scalable loop is limited to at most 2 threads so that the majority of the threads can be saved for the more scalable loop. It uses task_group to wait for a specific subset of tasks.

tbb::task_scheduler_init def_init; // Use the default number of threads
tbb::task_arena limited(2);// no more than 2 threads in this arena
tbb::task_group tg;

limited.enqueue([&]{ // use at most 2 threads
    tg.run([]{ // run in task group
        tbb::parallel_for(1, N, unscalable_work());

// Run another job concurrently with the loop above
// It can use the default number of threads:
tbb::parallel_for(1, M, scalable_work());

// Put the wait for the task group inside execute()
// This will wait only for the tasks that are in
// this task group.
arena.execute([&]{ tg.wait(); });

Improved C++ 11 support

Compatibility with C++11 standard interfaces and semantics improved for tbb/compat/thread and tbb::mutex. Though you can still build them in C++03 compatibility mode.

NOTE! For compatibility with C++11 standard, copy and move constructors and assignment operators are disabled for all mutex classes. To allow the old behavior, use TBB_DEPRECATED_MUTEX_COPYING macro.

You can now avoid unnecessary object copying with C++11 move constructors, emplace() method and rvalue references:

  • C++11 move constructors and assignment operators have been added to concurrent_vector, concurrent_hash_map, concurrent_priority_queue, concurrent_unordered_{set,multiset,map,multimap}. concurrent_queue and concurrent_bounded_queue have got only move constructors.
  • C++11 move aware emplace/push/pop methods have been added to concurrent_vector, concurrent_queue, concurrent_bounded_queue, concurrent_priority_queue.

Methods to insert a C++11 initializer lists have been added to concurrent_vector::grow_by(), concurrent_hash_map::insert(), concurrent_unordered_{set,multiset,map,multimap}::insert().

Memory allocator

  • Improved tbbmalloc increases performance and scalability for threaded applications.
  • Dynamic replacement of standard memory allocation routines has been added for OS X*.

Build and Debug

  • Microsoft* Visual Studio* projects for Intel TBB examples updated to VS 2010.
  • For open-source packages, debugging information (line numbers) in precompiled binaries now matches the source code.
  • Debug information was added to release builds for OS X*, Solaris*, FreeBSD* operating systems and MinGW*.
  • Various improvements in documentation, debug diagnostics and examples.

Preview Features:

  • Additional actions on reset of graphs, and extraction of individual nodes from a graph (TBB_PREVIEW_FLOW_GRAPH_FEATURES).
  • Support for an arbitrary number of arguments in parallel_invoke (TBB_PREVIEW_VARIADIC_PARALLEL_INVOKE).

Changes affecting backward compatibility:

  • For compatibility with C++11 standard, copy and move constructors and assignment operators are disabled for all mutex classes. To allow the old behavior, use TBB_DEPRECATED_MUTEX_COPYING macro.
  • flow::sequencer_node rejects messages with repeating sequence numbers.
  • Changed internal interface between tbbmalloc and tbbmalloc_proxy.
  • Following deprecated functionality has been removed:
    • old debugging macros TBB_DO_ASSERT & TBB_DO_THREADING_TOOLS;
    • no-op depth-related methods in class task;
    • tbb::deprecated::concurrent_queue;
    • deprecated variants of concurrent_vector methods.
  • register_successor() and remove_successor() are deprecated as methods to add and remove edges in flow::graph; use make_edge() and remove_edge() instead.

Bugs fixed:

  • Fixed incorrect scalable_msize() implementation for aligned objects.
  • Flow graph buffering nodes now destroy their copy of forwarded items.
  • Multiple fixes in task_arena implementation, including for:
    • inconsistent task scheduler state inside executed functions;
    • incorrect floating-point settings and exception propagation;
    • possible stalls in concurrent invocations of execute().
  • Fixed floating-point settings propagation when the same instance of task_group_context is used in different arenas.
  • Fixed compilation error in pipeline.h with Intel Compiler on OS X*.
  • Added missed headers for individual components to tbb.h.

Open-source contributions integrated:

  • Range interface addition to parallel_do, parallel_for_each and parallel_sort by Stephan Dollberg.
  • Variadic template implementation of parallel_invoke by Kizza George Mbidde (see Preview Features).
  • Improvement in Seismic example for MacBook Pro* with Retina* display by Raf Schietekat.

Find the new Intel TBB 4.3 at commercial and open source sites. Download and enjoy the new functionality!

For more complete information about compiler optimizations, see our Optimization Notice.