Clay's blog http://softwarecommunity.intel.com/ISN/Community/en-us/blogs/multi-core-thredmonkey/archive/2006/12/18/30228042.aspx asks if Intel® Threading Building Blocks [Intel® TBB] is a solution looking for a problem. OpenMP is great if you have Fortran code, or C code that looks like Fortran, or C++ that looks like Fortran. In other words, flat do-loop centric parallelism. With TBB, we're trying to go beyond that and enable generic parallel programming.
For example, TBB provides a parallel sort. If you look at the implementation in include/tbb/parallel_sort.h, you'll see that it's a parallel quicksort implemented using parallel_for, without any explicit recursion. We can do that because TBB let's you define your own iteration spaces. You just have to specify signatures for:
- Is the space empty?
- Should the iteration space be split?
- If it should be split, how to split it.
TBB provides one and two dimensional spaces. These work for signed and unsigned integral types and pointer types. In contrast, an OpenMP parallel for loop is restricted to a signed integral type, and only one dimension.
The TBB parallel_reduce is another good example of being generic. Consider the simple problem of finding the index of the minimum value in an array. OpenMP allows reductions only on built-in types. But to solve this problem, you need a reduction over a pair type (value,index). The TBB parallel_reduce works generically on any type. You just have to provide a few signatures. Section 3.3.1 of the TBB Tutorial explains the "index of minimum" solution in detail.
Clay finds the primes example in TBB a bit long. That's because it's a reasonably good algorithm (e.g. it's blocked for cache and does dynamic memory allocation). It's not the ridiculously slow "try all divisors" algorithm sometimes found in some "how to thread" articles.
If OpenMP fits, use it. If it doesn't, consider TBB as the next step.