Intel® Threading Building Blocks

Starting top level tasks

I'm porting the deal.II library (, a library for finite element computations, to use TBB. Let me say I love TBB and that pipeline is exactly the tool that is needed in many places.

I want to move to more unstructured cases now. Essentially, here is a sketch of what I want to do:


void worker (int i);

int main () {

int i;

while (std::cin >> i)

create a new task and run worker(i) on it



Memory allocator efficiency?

AndreiAlexandrescu, in Modern C++ Design, states that: "For occult reasons, the default allocator is notoriously slow". :)Then, to overcome some of this inefficiency, he continues with the design of a small object allocator.

TheAlexandrescu allocator (available as part of the Loki open source library) seems to work as a Singleton so memory allocations will be global. Now if I've got it right,this is exactly what theTBB allocator is designed to avoid; The whole purpose of the TBB allocator is to make allocations on a per thread basis.

My questions are:

help.. how to show that the processors are being used..

Is there any tool or code to determine if the processors are being utilized by the threads?

I have this parallel code that is slower than its serial implementation. I just wanted to show that that parallel version is really parallel even though it is slower compared to its serial implementation.

TBB + single core + GPU = ?


I've got two question which are more or less related to the use of a GPU.

Firstly : this is an excerpt from the TBB reference :

The task scheduler is intended for parallelizing computationally intensive work.
Because task objects are not scheduled preemptively, they should not make calls that
might block for long periods, because meanwhile that thread is precluded from
servicing other tasks.

Okay, but what do you call a "long period" ? Is submitting a blocking draw call to the GPU "long" ?

symbolic information may not be available due to inline assembly

I was wondering what this error message means. My code does link to a few home grown libraries but none of them contain any threaded code. The only thing that does contain threaded code is a call in which uses parallel_for. Sadly, when I compile the code to see f thread checker can tell me why it is slower than un-threaded code I get this error. I am not sure what to do next. Any help would be appreciated.

Suscribirse a Intel® Threading Building Blocks