Intel® Threading Building Blocks

How to debug threading errors in TBB?

As I understand, in TBB we focus on task creation. The thread creation is taken care of by TBB internally. Now, if there is an error, what tools can I use to debug issues? The query is more specifically about understanding which threads created by TBB are being used for computation (in a loop or a task) and the others which could be status threads or that are sleeping or inactive. The problem statement is about understanding which threads to focus and which ones are causing issues? What kind of debugger that I can use?

thank you.



TBB and Boost

1. Is there a penalty for using boost Shared pointers with the scalable allocator? I know that shared pointers require an extra level of indirection, but I am curious if the atomic ref count in shared pointer has an impact? I am using this in a pipeline btw.

2. Is there anything similar to the boost threadpool in the tbb world (see I have used this product with TBB without any problems but am not sure if this is the fastest solution. Would it be better to reimplement this with tbb?

TBB and Nehalem

I read this on xtremetech:

"Fast, unaligned cache access-Before Nehalem, data needed to be aligned on cache line boundaries for maximum performance. That's no longer true with Nehalem. This will help newer applications written for Nehalem, more than older ones, only because compilers and application authors often took great care to align data along cache line boundaries."

Does this mean that I wouldn't need to use the scalable allocator if I am running my app on Nehalem? Or do I still have a penalty if I use the default new? Thanks in advance.

mixing TBB with MPI

Hello, it's now been a couple of times that I heard people saying that they heard of cases when TBB was used in hybrid distributed HPC applications, but they did not know the applications names. To me this is somewhat surprising (but like, a good surprise), because MPI applications are usually written in plain C or Fortran and if there is a computational loop to be parallelized using threads, programmers usually stick with OpenMP.

lengthy postponed shared data initialization & thread locking question

I understand from the docs that if e.g. tbb has 2 physical threads but many tasks, if two tasks reach the same lengthy postponed initialization code of shared data, and thus one task must wait for the other to finish (through a mutex), the thread running that waiting task, will wait as well, even as many other tasks are available to be executed.


I am just trying out TBB and I am wondering, Is it always necessary to put to the environment path where the tbb_debug.dll is or to put the tbb_debug.dll in the relative path where the .exe file I compiled to run the .exe? Because when I tried to run the .exe file in a different computer without the tbb_debug.dll, I prompted an error saying that this dll is not found.

help!! parallel_reduce output problem..

I am just new to thread building block and also parallel computing. I am trying to implement a selection sort like algorithm i parallel, only that the searching of the mininum number is done in parallel, the swapping is done in serial. My problem is that the output is not the same in with the serial implementaion when the input data is in random in order but when my input is in reverse order, the output is correct and in ascending order. I can't figure out what is wrong with it. My code is here..

Eventcount (gate) proposal

I've implemented a sketch of eventcount for TBB. It can be used as replacement for Gate in current TBB scheduler, and/or it can be used as blocking/signalling logic in concurrent_queue, and/or it can be exposed as public API, and/or on top of it portable condition variable can be implemented and exposed as public API. I want to know what do you think about implementation and these usages, and whether it's worth doing for me to finish the implementation and submit it as official contribution.

Suscribirse a Intel® Threading Building Blocks