Intel® Threading Building Blocks

How can I set up a "repeater" in the flow graph?

I am new to TBB flow graph and also new to this forum.  Forgive me if I made mistakes. 

In my flow graph, I have two parallel paths A and B that are joined at join_node c.    Along path A there are a number of function nodes.  Messages may flow pass and be processed.  B has only one node b that represents a pre- task.  This task can be carried out in parallel to tasks on path A, but must be finished before the join_node n can execute, which is why I choose a join_node for c. 

An always pulling multiple-in-multiple-out tab::flow node


I would like to use tbb::flow in the context of a Software Defined Radio (SDR)
application. This seems like a perfect fit, because I need to pipeline complex 

I have read that TBB nodes use a push-pull process for communication---the 
sender will push as long as the receiver is able to accept. If this is not 
possible because the node is still running, the edge goes in a pull mode with 
the receiver pulling results from the sender.

The problem I have is that some algorithms have multiple inputs which are 

lock cmpxchg8b causes excessive L3 Cache Misses


not sure where to post elsewhere. I have noticed several times that 32 bit applications which use the

lock cmpxchg8b

instruction suffer random performance bugs due to excessive L3 Cache misses. This goes from Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz up to my Haswell CPU at home. I have seen this even in single threaded scenarios where nothing else was happening. This caused e.g. a 130ms delay only due to some small code changes were code was removed which made it actually slower.

Reducing memory footprint

Dear all,

I need to reduce the memory footprint of my program, while retaining some speed. I might also switch data structures. The problem: I need to count all integers of a fixed size in bits, let's say 32 bits, from a file.

Right now I am using a concurrent hashmap, but after some time it spikes to 10GB or RAM and since computation basically stops, I think I am just trashing my memory (laptop with 8GB). Another option is to use a concurrent vector.

Also, I am using a pipeline to read the integers, I don't know if this point might be optimized.


1)  I thought use of std::sort was same as tbb::paralllel_sort though simply changing func leads to a compile error .. Visual Studio ...

Error 38 error C3848: expression having type 'const less_than_key' would lose some const-volatile qualifiers in order to call 'bool less_than_key::operator ()(FaceDistance &,FaceDistance &)'

Subscribe to Intel® Threading Building Blocks