Intel® Threading Building Blocks

Starting top level tasks

I'm porting the deal.II library (, a library for finite element computations, to use TBB. Let me say I love TBB and that pipeline is exactly the tool that is needed in many places.

I want to move to more unstructured cases now. Essentially, here is a sketch of what I want to do:


void worker (int i);

int main () {

int i;

while (std::cin >> i)

create a new task and run worker(i) on it



Memory allocator efficiency?

AndreiAlexandrescu, in Modern C++ Design, states that: "For occult reasons, the default allocator is notoriously slow". :)Then, to overcome some of this inefficiency, he continues with the design of a small object allocator.

TheAlexandrescu allocator (available as part of the Loki open source library) seems to work as a Singleton so memory allocations will be global. Now if I've got it right,this is exactly what theTBB allocator is designed to avoid; The whole purpose of the TBB allocator is to make allocations on a per thread basis.

My questions are:

help.. how to show that the processors are being used..

Is there any tool or code to determine if the processors are being utilized by the threads?

I have this parallel code that is slower than its serial implementation. I just wanted to show that that parallel version is really parallel even though it is slower compared to its serial implementation.

TBB + single core + GPU = ?


I've got two question which are more or less related to the use of a GPU.

Firstly : this is an excerpt from the TBB reference :

The task scheduler is intended for parallelizing computationally intensive work.
Because task objects are not scheduled preemptively, they should not make calls that
might block for long periods, because meanwhile that thread is precluded from
servicing other tasks.

Okay, but what do you call a "long period" ? Is submitting a blocking draw call to the GPU "long" ?

symbolic information may not be available due to inline assembly

I was wondering what this error message means. My code does link to a few home grown libraries but none of them contain any threaded code. The only thing that does contain threaded code is a call in which uses parallel_for. Sadly, when I compile the code to see f thread checker can tell me why it is slower than un-threaded code I get this error. I am not sure what to do next. Any help would be appreciated.


In tbb21_20081109oss, shouldn't tbb::pipeline::end_of_input be atomic (at least for pedantic reasons), and what's the deal with tbb::pipeline::token_counter (next_token_number() is executed with a lock on stage_task, not pipeline)? Is there or could somebody write a user guide that makes clear how the parts fit together?

algorithm sort vs parallel_sort

I have some questions about parallel_sort. When I tried to compare the performance of the parallel_sort() with sort(), why is that sort() is faster than the parallel_sort?

Is it because of the grainsize value which is 500?

And while sorting, I saw that the CPU usage is not that good, actually it utilizes CPU less than what the sort utilizes, why is that?

Is the load balancing an issue here and selecting a good pivot?

How to debug threading errors in TBB?

As I understand, in TBB we focus on task creation. The thread creation is taken care of by TBB internally. Now, if there is an error, what tools can I use to debug issues? The query is more specifically about understanding which threads created by TBB are being used for computation (in a loop or a task) and the others which could be status threads or that are sleeping or inactive. The problem statement is about understanding which threads to focus and which ones are causing issues? What kind of debugger that I can use?

thank you.



Suscribirse a Intel® Threading Building Blocks