Intel® Threading Building Blocks, OpenMP, or native threads?

Which API do you choose to introduce threading to your software application, if you have a choice? Is there one answer that always works? In this paper, we review different considerations that a developer needs to make when it is time to decide. The key areas to focus on are your development environment and the complexity of your parallel model. Let us compare capabilities and address considerations around coexistence of these APIs in your software.

The Development Environment

Simplicity/Complexity considerations

The native threads programming model introduces much more complexity within the code than OpenMP or Intel® Threading Building Blocks (TBB), making it more challenging to maintain. One of the benefits of using Intel® TBB or OpenMP when appropriate is that these APIs create and manage the thread pool for you: thread synchronization and scheduling are handled automatically.

Programming Languages, Compiler Support and Portability Considerations


If the code is written in C++, it’s likely that Intel® TBB is the best fit. Intel® TBB matches especially well with the code that is highly object oriented, and makes heavy use of C++ templates and user defined types. If the code is written in C or FORTRAN, OpenMP may be the better solution because it fits better than Intel® TBB into a structured coding style and for simple cases, it introduces less coding overhead. But even with C++ code, if the algorithms are dominated by array processing activity, OpenMP may be a better choice than TBB in terms of coding complexity. The complexity of the native threads programming model is comparable for C and C++ languages. However, since the threaded work must be described as a function, programming with native threads may look more natural with languages like C. For highly object oriented C++ programs, native threads usage may break the style and design because it's difficult to express threads in terms of objects cleanly.

Intel® TBB and native threads do not require specific compiler support; OpenMP does. The use of OpenMP requires that you compile with a compiler that recognizes OpenMP pragmas. The Intel® C++ and Fortran Compilers support OpenMP. Recently, most other C++ and Fortran compilers have added at least some support for OpenMP.

OpenMP and Intel® TBB based solutions are portable across Windows, Linux, Mac OS X, Solaris and many other operating systems. Porting a native threads based solution onto another OS often requires code changes and increases the initial development/debugging effort and the maintenance burden, especially if you want portability from Windows (where Windows threads are typically used) and UNIX (where POSIX threads are typically used).

 

The Complexity of the Parallel Model

Look at what you want to make parallel. Use OpenMP if the parallelism is primarily for bounded loops over built-in types, or if it is flat do-loop centric parallelism.

TBB relies on generic programming, so use its loop parallelization patterns if you need to work with custom iteration spaces or complex reduction operations. Also, consider using TBB if you need to go beyond loop-based parallelism, since it provides generic parallel patterns for parallel while-loops, data-flow pipeline models, parallel sorts and prefixes.

Nested parallelism is supported by OpenMP and can be implemented with native threads. However, it may be hard to avoid resource over-utilization with these two threading APIs. TBB has been designed to naturally support nested and recursive parallelism. A fixed number of threads are managed by the TBB task scheduler’s task stealing technique. With this and the task scheduler’s, dynamic load balancing algorithm , TBB makes it possible to keep all of the processor cores busy with useful work without over-subscription (too many software threads means unnecessary overhead) and with minimal under-subscription (too few software threads means you’re not taking full advantage of the multiples of cores available).

TB B and OpenMP are designed for threading for performance and scalability, providing constructs that emphasize scalable data parallel decomposition. They are very useful when doing compute intensive work. Introducing parallelism and getting good scalable performance is much harder with native threads. You are more likely to introduce threading errors such as data races and deadlocks if you use native threads to implement the patterns/algorithms that TBB already provides off the shelf. That being said, there may be cases where native threads provide a better option, such as when doing event based or I/O based threading.

 

Capabilities Comparison 

 

Intel® TBB

OpenMP

Threads

Task level parallelism

+

+

-

Data decomposition support

+

+

-

Complex parallel patterns (non-loops)

+

-

-

Broadly applicable generic parallel patterns

+

-

-

Scalable nested parallelism support

+

-

-

Built-in load balancing

+

+

-

Affinity support

-

+

+

Static scheduling

-

+

-

Concurrent data structures

+

-

-

Scalable memory allocator

+

-

-

I/O dominated tasks

-

-

+

User-level synchronization primitives

+

+

-

Compiler support is not required

+

-

+

Cross OS support

+

+

-

 

Earlier we mentioned development environment and parallel model complexity related considerations when deciding which threading API to consider. But what happens if you come upon a case where either TBB or OpenMP could be a usable option? Then you would look at the features within the APIs. If you need features exclusive to OpenMP, then choose OpenMP. If you need features exclusive to TBB, then use TBB. If the features you need are available with both TBB and OpenMP, then we recommend you to consider the maintenance cost: some programming styles may naturally fit better for either TBB or OpenMP, both TBB and OpenMP are portable but have different set of requirements to the development environment. TBB and OpenMP can co-exist but there may be performance issues discussed in “Co-existence” section. Therefore, it is better to pick the model that covers all your needs. If you work on the new design and plan to use C++ then TBB may be a good option: TBB is designed to anticipate incremental parallelization - allowing additional parallelization without creating unnecessary threads that can lead to over-utilization.

Intel® TBB, OpenMP, and native threads based solutions are expected to perform comparably (offer comparable performance) on equivalent algorithms. However, the significant amount of additional coding overhead necessitated by the low level native threads API make TBB and OpenMP preferable options.

 

Co-existence

TBB, OpenMP, and native threads can co-exist and interoperate. However, oversubscription is possible because TBB and OpenMP run-time libraries create separate thread pools, by default each creates a number of threads that matches the number of cores. Both sets of worker threads are used for compute intensive work and oversubscription inevitably results. Therefore, we recommended rewriting OpenMP code using Intel® TBB if the use of TBB fits the design criteria for the application. That being said, oversubscription may not be a problem the OpenMP work does not overlap with the TBB activity.

Intel® TBB task scheduler is unfair and non-preemptive. Therefore, it is not recommended to use Intel® TBB for I/O bound tasks. Using native threads for such tasks is often a better idea and native threads co-exist with Intel® TBB components.

 

Conclusion

Choosing threading approach is an important part of the parallel application design process. There is no single solution that fits all needs and environments. Some require compiler support, some are not portable or are not supported by the specialized threading analysis tools. We designed Intel® Threading Building Blocks to cover commonly used parallel design patterns and we made it a sufficient framework to help create scalable programs faster by providing concurrent data containers, synchronization primitives, parallel algorithms, and scalable memory allocator.

For further information

1. “Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism”, James Reinders, O'Reilly Media, 2007, ISBN 0596514808.
2. Open Source project Web Page: http://www.threadingbuildingblocks.org
3. Product Web Page: /en-us/articles/intel-tbb/
4. Intel® Threading Building Blocks: Scalable Programming for Multi-Core
5. “Demystify Scalable Parallelism with Intel Threading Building Block’s Generic Parallel Algorithms”: http://www.devx.com/cplus/Article/32935
6. “Enable Safe, Scalable Parallelism with Intel Threading Building Block's Concurrent Containers”: http://www.devx.com/cplus/Article/33334
7. Product Review: Intel Threading Building Blocks: http://www.devx.com/go-parallel/Article/33270
8. “The Concurrency Revolution”, Herb Sutter, Dr. Dobb’s 1/19/2005: http://www.ddj.com/cpp/184401916

 

 

 

 

 

Einzelheiten zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.