Threading Building Blocks Tutorial at OSCON

Threading Building Blocks Tutorial at OSCON

I'm at OSCON, and this morning I attended the Intel tutorial about Threading Building Blocks (TBB). The 3+ hour tutorial was titled "New Parallel Programming Tools for a Multicore World." Though I've been studying TBB for several weeks, I learned a lot of new things, and I especially came away with lots of ideas for experiments I'd like to perform using TBB. I've been developing software for a long time, and it's pretty easy for me to think of tasks from the past where, had TBB been available, my work would likely have been easier, with a better, more efficient final product also resulting.

What "Building Blocks" means

One thing that the class pointed out is that Threading Building Blocks really does consist of "blocks." It's not a monolithic structure that you attach to your code, then you call "it" and somehow pass in a complicated list of settings and a script telling it what you want to do when, etc. Not at all. TBB really is a set of tools (the "building blocks") that can be utilized individually when and where you need them.

For example, if you only need to parallelize a standard for loop, you use the TBB parallel_for construct -- without needing to concern yourself with other TBB templates like containers, memory allocators, mutexes, etc. Likewise, if you've already got a working multithreaded application (using Pthreads or Windows threads, for example), but you've had to apply exclusion because you're using an STL container in part of your code (STL containers are not threadsafe), you could simply replace your STL container with an equivalent (but threadsafe by design) TBB container, without needing to concern yourself with other aspects of TBB.

In other words, TBB really is a library, that gives you the control to apply aspects of threading where you need them. If you're familiar with the Standard Template Library, you'll understand some of the core design principles that were applied in the creation of TBB. TBB is a template library; it provides a set of "building blocks" that can be applied as generic tools for solving the problems and performing the tasks that are inherent in multithreaded programming.

A tutorial highlight or two

James Reinders, author of the TBB book (Intel Threading Building Blocks), started out the session with a brief introduction that included the announcement (though it was already public) that Threading Building Blocks is now a full open source project. The rest of the tutorial was presented by Robert Reed, with demo applications aptly presented in words and slides by Victoria Gromova (the projector and her Linux laptop did not understand one another, so the code couldn't be seen in action).


There was a lot of discussion of cache. I'm not an expert on processors and cache, but as I listened to all this I really wondered what performance increases I might have been able to bring to my prior projects had I spent time designing for efficient utilization of cache.

It turns out that the necessity for a process/thread to go out to your system memory is fairly akin to when Windows runs out of system memory and starts using its so-called "virtual memory" (i.e., disk space). If you've used Windows for high-volume processing of any type, you know how disastrous this is: you may as well just shutdown the system at that point.

But, a threaded application that is designed with a granularity such that packets of work require an amount of memory that can remain fully resident on the cache can be at least seven times faster than an application that has to frequently go out to the system memory to reload data. The 7.0 times faster value was from an application Robert tested with respect to varied methods of memory usuage.

Of course, I would have to manually recode my work on the old projects, since was using low-level threading and memory management. TBB does the cache-usage optimization automatically, if you'd like it to do this for you. The flexibility for you to select your own granularity and other controlling settings is there, if you'd prefer (or need) to go that way. The templates are indeed flexible in terms of configuration.

My favorite TBB diagram

James Reinders' introduction included my favorite TBB diagram -- which is not really a depiction of TBB, but rather an illustration of the historical technologies that influenced the development of Threading Building Blocks. If you get the book, you'll find this diagram on page 284: "Figure 12-1. Key influences on design of Intel Threading Building Blocks." In the diagram you'll see names and acronyms like Cilk, OpenMP, STL, ECMA CLI, and Chare Kernel.

This is one of the things I really like about TBB. The designers and developers didn't just say "OK, we're Intel, we make the world's best multicore processors; now let's go out and invent (from scratch) our preferred method for programmers to use those multicore processors."

They didn't do that. Rather, TBB's creation was actually an effort in reading the history of the parallel programming community's efforts over the past couple decades, trying to extract the cumulative lessons learned, then applying what has worked well in the past to this old, but also in many ways new, problem in parallel programming.

What "new" parallel programming problem does TBB help solve?

How can we call parallel programming for multicore processors a "new" problem? Well, it's a new problem with respect to developers, more than with code. The old coding methods still work. But, who's going to do all that coding if all we have to work with is the old, low-level methods?

Historically, parallel programming was a highly specialized domain within the entirety of computer programming, and it was performed by "experts" (or, perhaps by very patient fools -- it depends on how you'd like to classify "us"). In the multicore future, a much larger group of developers will have to be developing multithread applications or portions of applications, and almost all other developers will at least have to consider that the code they're writing will likely run on a multicore platform.

Can we really expect all those developers to have the patience for coping with a situation where five consecutive executions of a program that "looks right" produce five different answers, all of them wrong? Threading Building Blocks helps solve this problem by abstracting the threading management details behind a "wall" of much more readily comprehensible template constructs.

TBB as an Open Source project

That Threading Building Blocks would become an Open Source project in ways seems predestined, when you look that "key influences" diagram that I like so much; when you consider TBB's roots, and how carefully the past work of the parallel processing community (and also, the STL community) was studied and utilized in the design of TBB. In a sense, TBB had its roots in community, and with the project going Open Source, TBB is returned to the community, which can now actively participate in its future development.

I like that!

Kevin Farnham
O'Reilly Media

For more complete information about compiler optimizations, see our Optimization Notice.