Intel® Threading Building Blocks (Intel® TBB)

 Widely used C++ template library for task parallelism

  • Rich set of components to efficiently implement higher-level, task-based parallelism
  • Future-proof applications to tap multicore and many-core power
  • Compatible with multiple compilers and portable to various operating systems

From $499
Buy Now

Or Download a Free 30-Day Evaluation Version

Simplify Parallelism with a Scalable Parallel Model

Intel® Threading Building Blocks (Intel® TBB) 4.2 is a widely used, award-winning C and C++ library for creating high performance, scalable parallel applications.

  • Enhance Productivity and Reliability - Rich set of components to efficiently implement higher-level, task-based parallelism
  • Gain Performance Advantage Today and Tomorrow - Future-proof applications to tap multicore and many-core power
  • Fits Within Your Environment - Advanced threading library, compatible with multiple compilers and portable to various operating systems

"Intel® TBB provided us with optimized code that we did not have to develop or maintain for critical system services. I could assign my developers to code what we bring to the software table—crowd simulation software.”
Michaël Rouillé, CTO, Golaem


Also available as open source

Awards

Flow Graph

Flow Graph

The flow graph feature provides a flexible and convenient API for expressing static and dynamic dependencies between computations. It is customizable for a wide variety of problems. It also extends the applicability of Intel® TBB to event-driven/reactive programming models.

Intel® TBB delivers high performing and reliable code with less effort than hand-made threading. Pre-tested algorithms, concurrent containers, synchronization primitives, and a scalable memory allocator simplify parallel application development.


Dynamic Task Scheduler
Design For Scaling

Dynamic Task Scheduler

Application performance can automatically improve as processor core count increases by using abstract tasks. The sophisticated Intel® TBB task scheduler dynamically maps tasks to threads to balance the load among available cores, preserve cache locality, and maximize parallel performance. The implementation supports C++ exceptions, task/task group priorities, and cancellation which are essential for large and interactive parallel C++ applications.

Dynamic task scheduler and parallel algorithms support nested and recursive parallelism as well as running parallel constructs side-by-side. This is useful for introducing parallelism gradually and helps independent implementation of parallelism in different components of an application.


Interoperable
#Pragma Simd and Intel® TBB can be used together

Cross Platform Support and Composability

Organizations that require cross platform support today or anticipate needing it in the future should consider Intel® TBB. It is validated and commercially supported on Windows*, Linux*, and OS X* platforms, using multiple compilers. It is also available on FreeBSD*, IA-based Solaris*, and PowerPC*-based systems via the open source community. Intel® TBB is optimized for multicore architectures and Intel® Xeon Phi™ coprocessor.

Intel® TBB is designed to co-exist with other threading packages and technologies. Different components of Intel® TBB can be used independently and mixed with other threading technologies.


Portability

Organizations can expand their customer base by using a production-ready, open solution for parallelism that is available on a broad range of platforms. Intel® TBB is validated and commercially supported on Windows*, Linux*, and OS X* platforms, using multiple compilers. It is also available on FreeBSD*, IA-based Solaris*, and PowerPC*-based systems via the open source community.


Top Community Support
Order the Intel® Threading Building Blocks book online at amazon.com

Top Community Support

The broad support from an involved community provides developers access to additional platforms and OS’s. Intel® Premier Support services and Intel® Support Forums provide confidential support, technical notes, application notes, and the latest documentation.

A complete documentation package and code samples are readily available both as a part of Intel® TBB installation and online at http://threadingbuildingblocks.org. The User Guide provides an introduction into Intel® TBB. The Design Patterns chapter in the User Guide covers common parallel programming patterns and how to implement them using Intel® TBB. The Reference Manual contains formal descriptions of all classes and functions implemented in Intel® TBB.

What’s New in Intel® TBB 4.2

FeatureBenefit
Support for Latest Intel Architectures

Take advantage of the newest features in Intel’s latest processors including Transactional Synchronization Extensions (TSX). Adds support for Intel® Xeon Phi™ coprocessor for Windows and Intel® Xeon™ Processor (Ivy Bridge-EP).

Selecting the best models for your application today will set a path for you to take full advantage of multicore and many-core performance without re-writing your code. Start today by implementing parallelism for today’s architecture and be ready for future architectures.

Lower memory overhead

Improved heuristics in the memory allocator reduce memory overhead by intelligently releasing unused or stale memory.

Improved handling of large memory requests

Improved handling of large (>8K-128MB) memory requests results in better performance when using frequent large memory allocations. Use of big memory pages can now be explicitly enabled via a function call or environment variable.

Better Fork Support

Fork safety through a user enabled API that ensures Intel® TBB worker threads are completed before executing a fork.

PPL* Compatibility

Improved compatibility with Parallel Patterns Library (PPL) by adding concurrent_unordered_multimap and concurrent_unordered_multiset API’s.

Windows* Store

Customers that use Intel® TBB in their applications can now submit and sell their app through the Windows Store.

Android* OS support

The Android OS is now supported as a target operating system for improved application performance and power efficiency. See Beacon Mountain for more Android developer tool details.

Rich set of components for Performance and Productivity

Rich set of components for Performance and Productivity
Intel® TBB 4.2 Pre-Tested Capabilities

Parallel Algorithms
Generic implementation of common parallel performance patterns

Generic implementations of parallel patterns such as parallel loops, flow graphs, and pipelines can be an easy way to achieve a scalable parallel implementation without developing a custom solution from scratch.

Concurrent Containers
Generic implementation of common idioms for concurrent access

Intel® TBB 4.2 concurrent containers are a concurrency-friendly alternative to serial data containers. Serial data structures (such as C++ STL containers) often require a global lock to protect them from concurrent access and modification; Intel® TBB concurrent containers allow multiple threads to concurrently access and update items in the container increasing allowed concurrency and improving an application’s scalability.

Synchronization Primitives
Exception-safe locks, condition variables, and atomic operations

Intel® TBB 4.2 provides a comprehensive set of synchronization primitives with different qualities that are applicable to common synchronization strategies. Exception-safe implementation of locks helps to avoid a dead-lock in programs which use C++ exceptions. Usage of Intel® TBB atomic variables instead of the C-style atomic API minimizes potential data races.

Scalable Memory Allocators
Scalable memory manager and false-sharing free memory allocator

The scalable memory allocator avoids scalability bottlenecks by minimizing access to a shared memory heap via per-thread memory pool management. Special management of large (>=8KB) blocks allows more efficient resource usage, while still offering scalability and competitive performance. The cache-aligned memory allocator avoids false-sharing by not allowing allocated memory blocks to split a cache line.

Create arbitrary task trees

When an algorithm cannot be expressed with high-level Intel® TBB 4.2 constructs, the user can choose to create arbitrary task trees. Tasks can be spawned for better locality and performance or en-queued to maintain FIFO-like order and ensure starvation-resistant execution.

Conditional Numerical Reproducibility

Ensure deterministic associativity for floating-point arithmetic results with the new Intel® TBB template function ‘parallel_deterministic_reduce’.

C++11 Support

Intel® TBB can be used with C++11 compilers and supports lambda expressions. For developers using parallel algorithms, lambda expressions reduce the time and code needed by removing the requirement for separate objects or classes.

Scalability with Future-proofing

  • Intel® TBB provides a simple and rapid way of developing robust parallel applications that abstracts platform details and threading mechanisms for performance that scales with increasing core counts
  • Intel® Threading Building Blocks yields linear scaling in these example applications

Select the right Intel® TBB license
Available Commercially and as open source

Select the right Intel® TBB license

  • Commercial Binary Distribution for customers who may require commercial support services. Attractive pricing available for academic, student and classroom usage.
  • Open Source Distribution can be used under GPLv2 with the runtime exception allowing usage in proprietary applications. Allows support for additional OSs and hardware platforms. Both source and binary forms are available for download from http://threadingbuildingblocks.org.
  • Custom license available if you require the ability to modify or distribute the commercial source code of Intel® TBB. Contact your Intel representative for more information.

Rich set of components for Performance and Productivity

Rich set of components for Performance and Productivity
Intel® TBB 4.2 Pre-Tested Capabilities

Parallel Algorithms
Generic implementation of common parallel performance patterns

Generic implementations of parallel patterns such as parallel loops, flow graphs, and pipelines can be an easy way to achieve a scalable parallel implementation without developing a custom solution from scratch.

Concurrent Containers
Generic implementation of common idioms for concurrent access

Intel® TBB 4.2 concurrent containers are a concurrency-friendly alternative to serial data containers. Serial data structures (such as C++ STL containers) often require a global lock to protect them from concurrent access and modification; Intel® TBB concurrent containers allow multiple threads to concurrently access and update items in the container increasing allowed concurrency and improving an application’s scalability.

Synchronization Primitives
Exception-safe locks, condition variables, and atomic operations

Intel® TBB 4.2 provides a comprehensive set of synchronization primitives with different qualities that are applicable to common synchronization strategies. Exception-safe implementation of locks helps to avoid a dead-lock in programs which use C++ exceptions. Usage of Intel® TBB atomic variables instead of the C-style atomic API minimizes potential data races.

Scalable Memory Allocators
Scalable memory manager and false-sharing free memory allocator

The scalable memory allocator avoids scalability bottlenecks by minimizing access to a shared memory heap via per-thread memory pool management. Special management of large (>=8KB) blocks allows more efficient resource usage, while still offering scalability and competitive performance. The cache-aligned memory allocator avoids false-sharing by not allowing allocated memory blocks to split a cache line.

Create arbitrary task trees

When an algorithm cannot be expressed with high-level Intel® TBB 4.2 constructs, the user can choose to create arbitrary task trees. Tasks can be spawned for better locality and performance or en-queued to maintain FIFO-like order and ensure starvation-resistant execution.

Conditional Numerical Reproducibility

Ensure deterministic associativity for floating-point arithmetic results with the new Intel® TBB template function ‘parallel_deterministic_reduce’.

C++11 Support

Intel® TBB can be used with C++11 compilers and supports lambda expressions. For developers using parallel algorithms, lambda expressions reduce the time and code needed by removing the requirement for separate objects or classes.

Scalability with Future-proofing

  • Intel® TBB provides a simple and rapid way of developing robust parallel applications that abstracts platform details and threading mechanisms for performance that scales with increasing core counts
  • Intel® Threading Building Blocks yields linear scaling in these example applications

Select the right Intel® TBB license
Available Commercially and as open source

Select the right Intel® TBB license

  • Commercial Binary Distribution for customers who may require commercial support services. Attractive pricing available for academic, student and classroom usage.
  • Open Source Distribution can be used under GPLv2 with the runtime exception allowing usage in proprietary applications. Allows support for additional OSs and hardware platforms. Both source and binary forms are available for download from http://threadingbuildingblocks.org.
  • Custom license available if you require the ability to modify or distribute the commercial source code of Intel® TBB. Contact your Intel representative for more information.

Videos to get you started:

  • Introduction to Intel® Threading Building Blocks

Register for future Webinars


Previously recorded Webinars:

  • Creating parallel reactive and streaming applications with the Intel® Threading Building Blocks (Intel® TBB) flow graph

Featured Articles

No Content Found

More Tech Articles

Fluid Simulation for Video Games (Part 8)
By Dr. Michael J. GourlayPosted 05/30/20129
This is a series on fluid simulation for video games. This article explains how a vortex-based fluid simulation handles variable density in a fluid. The fluid flow includes motion because of buoyancy-heavier fluid sinks, and lighter fluid rises.
Case Study: Parallelizing a Recursive Problem with Intel® Threading Building Blocks
By louis-feng (Intel)Posted 04/25/20122
Intel worked closely with DreamWorks Animation engineers to improve the performance of a key rendering system library by up to 35X performance improvement in some cases.
Optimizing Without Breaking a Sweat
By John O (Intel)Posted 03/13/20121
This article describes novel techniques developed to optimize DreamWork Animation's rendering, animation, and special effects applications without recompiling or relinking by preloading highly optimized libraries at run-time.
Optimizations for MSC.Software SimXpert* using Intel® Threading Building Blocks (Intel® TBB)
By Bonnie Aona (Intel)Posted 02/13/20120
MSC.Software SimXpert* is a fully integrated simulation environment for performing multidiscipline based analysis with a graphical interface designed to facilitate the end-to-end simulations. This article describes the threading of SimXpert.

Pages

Subscribe to

Documentation:

User Guide and Reference Manual

Tutorials:

Loading Intel Software Documentation...

Supplemental Documentation

No Content Found
Subscribe to

You can reply to any of the forum topics below by clicking on the title. Please do not include private information such as your email address or product serial number in your posts. If you need to share private information with an Intel employee, they can start a private thread for you.

New topic    Search within this forum     Subscribe to this forum


OS X library install_name, current_version and compatibility_version
By Ryan S.1
The dynamic libraries that tbb builds on OS X are missing the install_name, current_version and compatibility_version. These should be specified at build time. (Where you already use the -dynamiclib flag, add the -install_name, -current_version and -compatibility_version flags with the appropriate values.) The install_name should be the absolute path where the library will be found after installation. For example, if libtbb.dylib will ultimately be installed at /usr/local/lib/libtbb.dylib, then at build time its install_name should be set to /usr/local/lib/libtbb.dylib. This means there will need to be a way (variable?) for the user invoking the build system to inform it what the final install prefix will be.
Intel(R) TBB 4.2 update 3 is released and available for download
By Vladimir Polin (Intel)5
Changes (w.r.t. Intel TBB 4.2 Update 2): Added support for Microsoft* Visual Studio* 2013. Improved Microsoft* PPL-compatible form of parallel_for for better support of auto-vectorization. Added a new example for cancellation and reset in the flow graph: Kohonen self-organizing map (examples/graph/som). Various improvements in source code, tests, and makefiles. Bugs fixed: Added dynamic replacement of _aligned_msize() previously missed. Fixed task_group::run_and_wait() to throw invalid_multiple_scheduling exception if the specified task handle is already scheduled. Open-source contributions integrated: A fix for ARM* processors by Steve Capper. Improvements in std::swap calls by Robert Maynard. You can download Intel TBB 4.2 update 3 from commercial and open source sites.
How to use tbb42_20131118oss with Visual Studio 2013
By bluequartz1
I have downloaded the package tbb42_20131118oss for windows but I do not see inside any folder marked VS2013? How do we use TBB with VS2013? I took a look in the tbbvars.bat and vs2013 is listed as a possible argument but when following the logic in the script none of the folders would be found? Am I missing something? Is TBB built into VS2013 and thus no need for it? How do I keep code compiling under several different versions of VS then?    Thanks for any help Mike Jackson  
TBB QA fails on Red Hat Enterprise Linux 6.3 and SusE 11
By dulantha_f17
I built the latest (4.2 update 2) source code on RHE 6.3 and SusE 11 64bit. On both machines the code builds fine but the QA fails with the following errors and returns to the command line. src/test/test_eh_tasks.cpp:222, assertion g_ExceptionCaught: no exception occured make[1]: *** {test_tbb_plain] Aborted rm test_assembly_compiler_builtins.o test_atomic_compiler_buildints.o make[1]: Leaving directory [...]/linux_intel64_gcc_cc4.3_libc2.11.3_kernel3.0.13_release make: [test] Error 2 (ignored) Has anyone else ran in to this problem? To build and run, I run 'make' and then 'make test'
Push back thread safety
By Sensei S.11
Dear all, I need to update concurrently a concurrent vector, and right now I'm just pushing back items in different threads. Now, since operator[] isn't thread-safe in updates, I'm having doubts: Is push_back thread-safe when inserting an element? My requirements do not need any particular ordering, so I'm just concerned about the elements not their order. Thanks!
errors in using tbb.dll to make standalone program with matlab 2013b
By Yubo T.4
Hi, I was using matlab 2013b and Matlab Compiler Runtime 2013b to make a standalone program. Matlab can successfully generate an executable but when I run it I got "The procedure entry point ?deallocate_via_handler_v3@internal@tbb@@YAXPAX@Z could not be located in the dynamic link library tbb.dll." I tried different codes and got the same error. My operating system is Windows 7 professional. I tried both 32 bit and 64 bit matlab and MCR but none of them worked out. I contacted Matlab technicians and was told that "There are known problems/errors/bugs/issues with tbb.DLL and the MATLAB R2013b compiler, at least for the Image Acquisition Toolbox". What can I do to fix this problem? Thanks in advance.
timed push/pop on concurrent_bounded_queue
By Nicolas M.0
Hi, I see that none of the blocking operations on the concurrent queue have timeouts. What would be the best way to implement them ? Or at least circumvent the lack of ? Thanks
Performance issues with tbbmalloc in case of large memory allocation
By Vladimir S.3
Hello, We're using tbb42_20131118oss version for Linux 64bit ( CentOS 5 ) in our product. Recently, following issue was discovered - for cases with large memory allocation we've noticed significant performance degradation after reaching some "critical point" in terms of memory. Specific example is: 1. Machine has about 1 terabyte of memory, it is mostly free, only our application was running. 2. Our application runs some algorithm that builds some data structure of the large size. And, ideally, we're expecting to see approximately constant memory increase over time till the end. 3. But, until physical memory is not reaching around 250-300 Gb, algorithm works relatively fast. After reaching this "critical point", it slows down dramatically ( i.e., the same portion of job which was completed in 5 minutes before reaching critical point, after critical point was completed in about 5 hours ). Finally, it finishes with physical memory about 465 Gb. And runtime profiler shows definite bott...

Pages

Subscribe to Forums
  • What is parallelism?
  • By way of analogy, if you’re trying to cook a multi-course meal and all you have is a single-burner stove, you can only cook one part of the meal at a time. If you have a stove with four burners, you can cook four things at once, bringing them to the table all at the same time. Software parallelism is similar. There are different techniques used to achieve parallelism—threading is one of them. The idea is to take an application and, where it is possible, split up the program so different parts can run simultaneously on different processors in a multicore configuration. Then, part of the application brings all the parts together to present the application results.

    Task-based parallelism is a mechanism to execute several work items (tasks) in parallel.

  • What is threading?
  • Threading is a kind of parallelism. It’s a technique used by software developers to decompose applications into parts that can be run simultaneously on a computer with multiple processors or multiple cores. Threaded applications run on a single computer, again with multiple cores, under the management of a single operating system.

  • Why does C++ need Intel® TBB?
  • C++, like other popularly used languages, was not designed to express parallelism. Fortunately, C++ is extensible using templates. Developers liked the OpenMP concept, whereby they could get scalable performance without adding much new code, yet needed something that was more conducive to the object oriented/template based programming style of C++. Developers wanted us to do something about parallel containers, and algorithms – so templates were a perfect fit. The ‘generic programming’ style which STL uses – which allows components to be easily composed without giving up performance appealed a great deal to us. We settled on extending C++ in a fashion similar to how STL extended C++.

    Abstraction is important to developers. Using native threads, doing your own explicit thread management, is like assembly language for parallelism. TBB is the abstraction we need for many reasons. Programming for parallelism using native threads is tedious, error prone and not portable. It also is seldom as scalable as it could be, because high levels of scalability are more difficult to program.

  • Does Intel charge run-time fees or royalties for its libraries?
  • No.

  • Are there analysis tools that understand the semantics of Intel® TBB?
  • Yes. Applications threaded with Threading Building Blocks can be analyzed with Intel® VTune™ Amplifier XE 2013 and Intel® Inspector XE 2013. Intel® Advisor XE 2013, available in any Intel® Studio XE product, can help find regions with the greatest performance potential from parallelism.

  • Where can I get an evaluation copy of Intel® TBB?
  • 30 day evaluation versions of Intel® Software Development Products are available for free download. You can get free support during the evaluation period by creating an Intel® Premier Support account after requesting the evaluation license. Click here for Windows, Linux or OS X*

  • Are there any books to help developers better understand how to use Intel® TBB?
  • A book on Intel Threading Building Blocks, by James Reinders has been published by O’Reilly Media. The tutorial, examples and other documentation which come with the download are excellent resources.

  • How do the commercial and open source versions differ?
  • Currently there are no product differences between the commercial and open source version. We maintain one source base and do builds for both from the same source base. The versions differ in the level support.

    Intel® TBB is offered commercially for Windows, Linux or OS X* customers who want additional support or cannot follow the open source GPLv2 with the runtime exception license. With a purchase, customers will receive one year of product updates available at the Intel® Registration Center and one year of technical support from Intel® Premier Support, our interactive issue management and communication website. This premier support service allows you to submit questions to Intel engineers. In addition to Intel® Premier Support, we have user forums, technical notes, application notes, and documentation available on our website.

    The open source support includes user forums, documentation and additional resources available on our website, threadingbuildingblocks.org.

  • Where can I get more information on the TBB open source offering?
  • For more information, visit threadingbuildingblocks.org. The site includes TBB source code, documentation, user forums, blogs, podcasts, articles, white papers and support areas.

  • What is parallelism?
  • By way of analogy, if you’re trying to cook a multi-course meal and all you have is a single-burner stove, you can only cook one part of the meal at a time. If you have a stove with four burners, you can cook four things at once, bringing them to the table all at the same time. Software parallelism is similar. There are different techniques used to achieve parallelism—threading is one of them. The idea is to take an application and, where it is possible, split up the program so different parts can run simultaneously on different processors in a multicore configuration. Then, part of the application brings all the parts together to present the application results.

    Task-based parallelism is a mechanism to execute several work items (tasks) in parallel.

  • What is threading?
  • Threading is a kind of parallelism. It’s a technique used by software developers to decompose applications into parts that can be run simultaneously on a computer with multiple processors or multiple cores. Threaded applications run on a single computer, again with multiple cores, under the management of a single operating system.

  • Why does C++ need Intel® TBB?
  • C++, like other popularly used languages, was not designed to express parallelism. Fortunately, C++ is extensible using templates. Developers liked the OpenMP concept, whereby they could get scalable performance without adding much new code, yet needed something that was more conducive to the object oriented/template based programming style of C++. Developers wanted us to do something about parallel containers, and algorithms – so templates were a perfect fit. The ‘generic programming’ style which STL uses – which allows components to be easily composed without giving up performance appealed a great deal to us. We settled on extending C++ in a fashion similar to how STL extended C++.

    Abstraction is important to developers. Using native threads, doing your own explicit thread management, is like assembly language for parallelism. TBB is the abstraction we need for many reasons. Programming for parallelism using native threads is tedious, error prone and not portable. It also is seldom as scalable as it could be, because high levels of scalability are more difficult to program.

  • Does Intel charge run-time fees or royalties for its libraries?
  • No.

  • Are there analysis tools that understand the semantics of Intel® TBB?
  • Yes. Applications threaded with Threading Building Blocks can be analyzed with Intel® VTune™ Amplifier XE 2013 and Intel® Inspector XE 2013. Intel® Advisor XE 2013, available in any Intel® Studio XE product, can help find regions with the greatest performance potential from parallelism.

  • Where can I get an evaluation copy of Intel® TBB?
  • 30 day evaluation versions of Intel® Software Development Products are available for free download. You can get free support during the evaluation period by creating an Intel® Premier Support account after requesting the evaluation license. Click here for Windows, Linux or OS X*

  • Are there any books to help developers better understand how to use Intel® TBB?
  • A book on Intel Threading Building Blocks, by James Reinders has been published by O’Reilly Media. The tutorial, examples and other documentation which come with the download are excellent resources.

  • How do the commercial and open source versions differ?
  • Currently there are no product differences between the commercial and open source version. We maintain one source base and do builds for both from the same source base. The versions differ in the level support.

    Intel® TBB is offered commercially for Windows, Linux or OS X* customers who want additional support or cannot follow the open source GPLv2 with the runtime exception license. With a purchase, customers will receive one year of product updates available at the Intel® Registration Center and one year of technical support from Intel® Premier Support, our interactive issue management and communication website. This premier support service allows you to submit questions to Intel engineers. In addition to Intel® Premier Support, we have user forums, technical notes, application notes, and documentation available on our website.

    The open source support includes user forums, documentation and additional resources available on our website, threadingbuildingblocks.org.

  • Where can I get more information on the TBB open source offering?
  • For more information, visit threadingbuildingblocks.org. The site includes TBB source code, documentation, user forums, blogs, podcasts, articles, white papers and support areas.

Intel® Threading Building Blocks 4.1

Getting Started?

Click the Learn tab for guides and links that will quickly get you started.

Get Help or Advice

Search Support Articles
Forums - The best place for timely answers from our technical experts and your peers. Use it even for bug reports.
Support - For secure, web-based, engineer-to-engineer support, visit our Intel® Premier Support web site. Intel Premier Support registration is required.
Download, Registration and Licensing Help - Specific help for download, registration, and licensing questions.

Resources

Release Notes - View Release Notes online!
Product Documentation - View documentation online!

Featured Support Topics

No Content Found