Threading Building Blocks: Solution Looking for a Problem?

I admit I just don't get it. I'm not a C++ programmer, so the preface to any response you might want to make to my criticisms and questions posed here could be "Clay, you ignorant slut!" I can accept that; I'm speaking, to some extent, from a position of ignorance.

When I first heard about the idea of Intel Threading Building Blocks (TBB), I asked myself: "Why?" What did TBB bring to the table that wasn't already available from something else? True, the ideas of abstracting parallelism details away from the programmer are welcome and are most likely the wave of the future. It is one of the main advantages of OpenMP versus explicit threads. But, from the discourses and discussions that have been published on TBB, it sounds a lot like the features one can already get from OpenMP.

Does OpenMP not work well with C++? I know that OpenMP is defined and available for C++. TBB has a parallel_for construct that seems just like OpenMP's worksharing construct. The parallel_reduce function has a corresponding OpenMP clause to perform this particular computation. There is also a parallel_while interface that looks to be close to the Intel-specific extension of "task queues". (Tim Mattson has publicly stated that this feature, with a slightly different syntax, of course, will be part of the OpenMP 3.0 standard that is due to be published in November 2007.) Besides the pipeline class, there really doesn't seem to be much else that is functionally new in TBB.

TBB has defined a few concurrent container classes that allow concurrent access to data. This abstraction makes it possible to access data in parallel rather than require the programmer to pimp out a "serial" container with some kind of lock to control access. This is a great idea, especially the concurrent_queue and concurrent_vector. However, in all my years of parallel and distributed programming, I have never seen or used or heard of anyone needing a hash table that required concurrent access. Maybe this is a data structure that C++ programmers find useful more often than I have?

The simplicity and abstraction side of TBB has been the main selling point for the product, so far. During his keynote at IDF Fall 2006, Intel VP Richard Wirt stated that TBB results in one-quarter less code using TBB versus native threads. (The woman helping out with some of the demos stated it was three-quarters less code when using TBB.) Simpler is better; less code is better. Since I'm not a C++ programmer, I can only look at code that someone else has written using TBB to get an idea about how much less code TBB might need versus a similar solution with explicit threads. I have looked over some of the example codes that come packaged with TBB.

The "tree_sum" code, demonstrating the task interface, has a serial version of the main computation, as well as two versions that are meant for concurrent execution. There are eight lines of code in the serial file and 35 in the simple parallel version and 43 in the optimized version. Is an almost 400% increase in code size less than what an implementation using native threads would require? I recently wrote a breadth-first search example code that only added about 50% more lines to create and manage threads and make the queue structure thread safe. The code for a parallel Sieve of Eratosthenes computation, in the TBB examples, to identify prime numbers took seven pages to scroll through. I can't imagine that such a simple algorithm would take more than one page of code to write with any native threads model (and even less for OpenMP).

Granted, I may have chosen the most egregious example codes to use as illustrations, but there just seems to be a whole lot of extraneous scaffolding that needs to be put into a code to prop up TBB. "Simple" was not the first word that comes to mind as I pored through the examples. Is all that extra work worth it? I guess it would be if the published performance results comparing TBB to native threads (both Windows and POSIX) on a single application will carry over to other apps. (Is there no comparison of TBB to OpenMP performance? This seems a more natural match in order to make a Granny Smith to Golden Delicious comparison.)

Rather than keep ragging on about this, let me restate by initial question: "Why?" Are C++ programmers at a higher level of consciousness that other abstraction interfaces, like OpenMP, are too simple or beneath them? Do they have a language all their own and would more readily adopt TBB because it speaks to them in that language? On paper, I think TBB is a great idea. I know many of the folks within Intel that are developing TBB (brilliant and smart, every one of them), so I'm not doubting the quality of the product. However, all the hype at the product launch made TBB seem like the greatest thing since the Renaissance. Perhaps the marketing has overreached the initial 1.0 version of the product.

While I may not grok all of the subtlety and power within C++, I can count code lines, I can read documentation, and I can attend presentations on the interface. None of this has given me the warm fuzzy that Threading Building Blocks is nothing more than a solution that has yet to find a problem it can solve better than what is currently available. Anyone care to enlighten me?


The opinions expressed on this site are mine alone and do not necessarily reflect the opinions or strategies of Intel Corporation or its worldwide subsidiaries.