Threading Building Blocks: Solution Looking for a Problem?

By Clay Breshears (Intel) (123 posts) on December 18, 2006 at 10:36 pm

I admit I just don't get it. I'm not a C++ programmer, so the preface to any response you might want to make to my criticisms and questions posed here could be "Clay, you ignorant slut!" I can accept that; I'm speaking, to some extent, from a position of ignorance.

When I first heard about the idea of Intel Threading Building Blocks (TBB), I asked myself: "Why?" What did TBB bring to the table that wasn't already available from something else? True, the ideas of abstracting parallelism details away from the programmer are welcome and are most likely the wave of the future. It is one of the main advantages of OpenMP versus explicit threads. But, from the discourses and discussions that have been published on TBB, it sounds a lot like the features one can already get from OpenMP.

Does OpenMP not work well with C++? I know that OpenMP is defined and available for C++. TBB has a parallel_for construct that seems just like OpenMP's worksharing construct. The parallel_reduce function has a corresponding OpenMP clause to perform this particular computation. There is also a parallel_while interface that looks to be close to the Intel-specific extension of "task queues". (Tim Mattson has publicly stated that this feature, with a slightly different syntax, of course, will be part of the OpenMP 3.0 standard that is due to be published in November 2007.) Besides the pipeline class, there really doesn't seem to be much else that is functionally new in TBB.

TBB has defined a few concurrent container classes that allow concurrent access to data. This abstraction makes it possible to access data in parallel rather than require the programmer to pimp out a "serial" container with some kind of lock to control access. This is a great idea, especially the concurrent_queue and concurrent_vector. However, in all my years of parallel and distributed programming, I have never seen or used or heard of anyone needing a hash table that required concurrent access. Maybe this is a data structure that C++ programmers find useful more often than I have?

The simplicity and abstraction side of TBB has been the main selling point for the product, so far. During his keynote at IDF Fall 2006, Intel VP Richard Wirt stated that TBB results in one-quarter less code using TBB versus native threads. (The woman helping out with some of the demos stated it was three-quarters less code when using TBB.) Simpler is better; less code is better. Since I'm not a C++ programmer, I can only look at code that someone else has written using TBB to get an idea about how much less code TBB might need versus a similar solution with explicit threads. I have looked over some of the example codes that come packaged with TBB.

The "tree_sum" code, demonstrating the task interface, has a serial version of the main computation, as well as two versions that are meant for concurrent execution. There are eight lines of code in the serial file and 35 in the simple parallel version and 43 in the optimized version. Is an almost 400% increase in code size less than what an implementation using native threads would require? I recently wrote a breadth-first search example code that only added about 50% more lines to create and manage threads and make the queue structure thread safe. The code for a parallel Sieve of Eratosthenes computation, in the TBB examples, to identify prime numbers took seven pages to scroll through. I can't imagine that such a simple algorithm would take more than one page of code to write with any native threads model (and even less for OpenMP).

Granted, I may have chosen the most egregious example codes to use as illustrations, but there just seems to be a whole lot of extraneous scaffolding that needs to be put into a code to prop up TBB. "Simple" was not the first word that comes to mind as I pored through the examples. Is all that extra work worth it? I guess it would be if the published performance results comparing TBB to native threads (both Windows and POSIX) on a single application will carry over to other apps. (Is there no comparison of TBB to OpenMP performance? This seems a more natural match in order to make a Granny Smith to Golden Delicious comparison.)

Rather than keep ragging on about this, let me restate by initial question: "Why?" Are C++ programmers at a higher level of consciousness that other abstraction interfaces, like OpenMP, are too simple or beneath them? Do they have a language all their own and would more readily adopt TBB because it speaks to them in that language? On paper, I think TBB is a great idea. I know many of the folks within Intel that are developing TBB (brilliant and smart, every one of them), so I'm not doubting the quality of the product. However, all the hype at the product launch made TBB seem like the greatest thing since the Renaissance. Perhaps the marketing has overreached the initial 1.0 version of the product.

While I may not grok all of the subtlety and power within C++, I can count code lines, I can read documentation, and I can attend presentations on the interface. None of this has given me the warm fuzzy that Threading Building Blocks is nothing more than a solution that has yet to find a problem it can solve better than what is currently available. Anyone care to enlighten me?

--clay

The opinions expressed on this site are mine alone and do not necessarily reflect the opinions or strategies of Intel Corporation or its worldwide subsidiaries.

Categories: Parallel Prog. & Multi-Core

Comments (5)

December 19, 2006 12:03 AM PST

Arch Robison (Intel)
I've posted a partial reply that explains why the parallel_for and parallel_reduce go well beyond OpenMP. See http://software.intel.com/en-us/blogs/category/multi-core/
January 4, 2007 10:39 AM PST


David Schwartz
I agree. I develop multi-threaded code in C++ for a living and have done so for about 16 years now. I spent a few hours looking at TBB and I really don't see anything of value there.

Are programmers really supposed to deal with that ugliness directy on a day-to-day basis? Or is there supposed to be some higher-level wrapper that hides that ugliness? If the latter, what should that higher-level wrapper be like and who is going to write it?

I just don't get how we're supposed to use this. We already have low-level threading APIs. Is the idea just that this will be portable across WIN32 and POSIX?

DS
February 12, 2007 9:23 AM PST

Clay Breshears (Intel)
Total Points:
14,983
Status Points:
14,983
Black Belt
Well, someone has seen some value. TBB is up for a 2007 Jolt Product Excellence Award (http://joltawards.com/2007/;jsessionid=LVYFIQUQZYCTAQSNDLPSKHSCJUNN2JVN).

Thanks to Arch for explaining a few things (see above links). I'm not sure how much code is NOT "C++ that looks like Fortran," so I'm not sure what the potential user base would be. (Having started programming with COBOL and Fortran, I'm sure I'd write C++ like Fortran, and this will color my judgments.) If there is one uniquely C++ example that uses TBB, I would hope the marketing team would be pushing that example to the forefront to show off the efficacy of the product.

As for the "ugliness" that David is pointing out, I agree that it does seem a bit onerous. If there was some way to automate some or all of this (similar to OpenMP), then you'd have a runaway hit. This may not be possible and keep the generic-ness of the product. If it is unavoidable, then you have to live with it; we just need to be aware that this may cause hesitation on being exposed to TBB for the first time.
July 25, 2007 7:02 PM PDT


Chris Fairles
Its all about what you know and what your used to. If you've used pthreads for years, it is more than familiar and there's probably no problem requiring threading that you can;t solve with it. Now some C++ programmer fairly new to the multi-threaded scene looks at pthreads and decides even though he/she has a multi-core system, most of his/her clients will to, it looks too complicated and too error prone to attempt. They look at OpenMP but you need a whole new compiler to use it (not to mention the rampant pre-processor directives). All things new that requires time to learn and time to debug.

But said person has used C++ a while, is used to the standard library and sees this library. From that perspective its like free threading. There's a bit of setup cost but end of day, no mutexes, no conditions, no identifying critical sections, no caring about whats atomic and what isn't, no deadlocks and no race conditions (ok, a slight overstatement but the risks are significantly reduced). You can dive right in and pipeline some filters together, process large amounts of data in parallel all while using familiar c++ concepts and doing next to nothing to orchestrate the threads. Not only that, but 0 changes need to be made to run his app on a Windows box vs. posix variants.

So perhaps TBB doesn't bring anything new to the table. But it does bring a fresh, simple, view of old concepts stripped of low-level implementation details that todays programmers can easily pick up to get a head-start developing apps in this multi-threaded multi-core world we now live in.
October 4, 2007 1:36 PM PDT


Rob Stewart
Chris hit it on the head. OpenMP requires lots of tool support, whereas TBB is a library. Perhaps it doesn't add much over OpenMP, or it might even offer less, I'm not qualified to say, but it is easy to incorporate into existing projects. TBB is much easier to use than low level APIs like pthreads or Win32 Thread APIs, and it eliminates entire classes of errors.

As for the ugly scaffolding, the code is quite readable to a C++ programmer. Compiler support can, no doubt, make things cleaner and simpler, but it isn't a big problem as is.

Trackbacks (2)


Leave a comment  

To obtain technical support, please go to Software Support.
Name (required)*

Email (required; will not be displayed on this page)*

Your URL (optional)


Comment*