Intel® Threading Building Blocks

Quick Poll/Survey on Intel® TBB

Dear Intel® TBB user,

I would love to hear your candid thoughts about the following question on TBB.

On a scale of 1-5 how likely is it that you would recommend the Intel® TBB to a peer or colleague with 5 being most likely to recommend?

Please leave your responses below and don’t forget to provide a reason for your score be it high or low.



Product Manager




Enqueue tasks and wait for the workers completion



I have an application in which my application thread spawns an std::thread at the beginning of the program. I define two task_arena and two task_group that are shared by the two master threads of my application. I want the first thread to use the first arena and first group and the second thread to use the second arena and second group.

For the moment my code looks like that:

Nested parallel_for with mutex hangs

I have a set of data blocks that I process using a parallel_for loop. These data blocks are held in a pool that may be compressed. The first thread to access a block that is in the compressed pool triggers an uncompress routine.Now, I have a mutex that ensures that the uncompress routine is only executed by one task thread. But the uncompress routine uses its own parallel_for loop to speed up the decompression. When the inner parallel_for loop ends, control doesn't go back to the parent task that started the uncompress routine.

Implementing a Synchronous DataFlow Graph using Intel Flow Graph


I started investigating Intel TBB recently and was thinking of the possibility of implementing an application specified as a Synchronous DataFlow Graph using function and queue nodes. I seems to me doable in a straightforward manner. Could someone confirm? Any thoughts?

Parallel STL Release Notes


Find the latest Release Notes for Parallel STL

This page provides the current Release Notes for the Parallel STL for Linux*, Windows* and OS X* products. All files are in TXT format.

  • Apple macOS*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • C/C++
  • Intel® Parallel Studio XE
  • Intel® Parallel Studio XE Cluster Edition
  • Intel® Parallel Studio XE Professional Edition
  • Intel® Threading Building Blocks
  • how to find where my program is spinning?


    I have looked through the forums and other TBB resources and based on vtune I can see my program is spending a lot of time spinning but I have not found out where it is spinning yet.

    I have parallel studio and would appreciate any advice on how to find out where the program is spinning so I can fix it. Overall it seems my parallelization is not very well balanced and I am trying to figure out where the problems are.



    I remember reading somewhere that if you link TBBMalloc or potentially use Scalabale Allocator, TBB will pre-allocates some amount of memory per thread to avoid implicit synchronizations. But i can't find this any more. I thought I found this in TBB Book but looks like it wasn't. 

    Is there any per thread preallocation happens in Scalable Allocator or in TBBMalloc?

    Many Thanks.

    flow::graph : graph.wait_for_all() loads one core while do nothing usefull

    void graph_test2()
    	const int NPROC = 20;
    	tbb::flow::graph g;
    	tbb::flow::broadcast_node< tbb::flow::continue_msg > start( g );
    	std::vector< tbb::flow::continue_node< int > > workers;
    	workers.reserve( NPROC );
    	std::vector<double> SUM( NPROC, 0. );
    	for(int i=0; i<NPROC; ++i) {
    		auto work = [&, i](const tbb::flow::continue_msg &) -> int
    			double & sum = SUM[i];
    			for(int k = 0; k < 1000000000; ++ k) {
    				sum += k;
    			return i;
    Subscribe to Intel® Threading Building Blocks