async_node not working asynchronously

async_node not working asynchronously


In my implementation, there is a single graph object and there are several async_nodes created using this graph object. These async nodes are the leaves; they dont have any connection to another node. When i check my logs, i see that try_put operation on the async nodes works synchronously. try_put returns after all activity inside the node's functor finishes. What should i do in order to make it work asynchronously? I expect the try_put return as soon as possible and the functor runs inside a tbb task.



publicaciones de 14 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Hello cinar e,

The body of the async_node is supposed to be small, since the only thing it should do is to signal asynchronous activity about the data to process. Due to this consideration, creation of a separate TBB task to execute async_node's body is seen as an unnecessary overhead, therefore async_node became lightweight node by default starting from TBB 2019 release. I was unable to find the note in CHANGES file about that modification. Perhaps, we missed that. Sorry about that. Please find the lightweight policy description here.

To make async_node non-lightweight and spawn TBB task to execute its body for each try_put call, please specify its policy as "queueing".

P.S.: Perhaps, if you describe the design of your workload in detail, I will be able to suggest better graph topology for your use case or confirm that it is already good enough.

Regards, Aleksei.

Hello Aleksei,

I used the default Policy which should be queueing_lightweight; then i will update it to be queueing.

I use the async_node for my asynchronous run needs on my application. I could have used pthreads or tbb tasks etc. but the queueing ability of flow graph nodes will help me a lot. In my use case, there is continuous flow of market data and the data of each security must be processed synchronously and data of different securities can be processed in parallel. (so each security has its own async_node)

So if i can make the async_node to work asynchronously, i think my needs will be met.


Hello cinar e,

I am still not able to see the full picture as the level of details obviously not enough. If this is not a trade secret or something similar, consider giving more details: draw some charts, explain with a code snippet and so on. At this moment I can only speculate on this.

I am trying to understand why you have several async_nodes instead of one.

It seems you are using dependency flow graph with "continue_msg" type flowing through the edges, which communicates just the signal to the nodes that they can start executing their bodies. If that's the case, consider using data flow graph and communicate not only the signal to the nodes but also a useful data. In such situation, you can have single async_node with the body that accepts the data to push the asynchronous work to, along with the gateway instance, of course. Note that having an unlimited concurrency for such async_node will allow you to execute the body of async_node in parallel. The net result is that you will have the same semantics you described, but using a much simpler graph representation in terms of the number of nodes, which will: 1) potentially reduce memory consumption; 2) make better experience with Intel(R) Advisor Flow Graph Analyzer (the link to it)

Regards, Aleksei.

Hello aleksei,

Actually my use case i pretty simple. It is such:


while (check_if_new_data_exists) {

get data

find instrument

get instrument's async_node

try_put to that async_node which will process the data


I assumed that try_put will work asynchronously but in reality it works synchronously.

There is only one graph in the application and all async_nodes are created using that graph object.


Instrument member variable: tbb::flow::async_node<MarketData, tbb::flow::continue_msg>* dataworker;

Inside while loop:                    bool b = (instrument->getDataworker())->try_put(marketData);

So will the following help me:

tbb::flow::async_node<MarketData, tbb::flow::continue_msg, queueing>* dataworker;


Well, if the only thing you want is to spawn the TBB task during the try_put in async_node then yes, making the async_node to have "queueing" policy instead of "queueing_lightweight" should do the trick.

However, you can get the same result using other primitives in TBB. Consider tbb::pipeline, for example. Its first filter would check the existence of the data and each subsequent filter will perform steps of the algorithm, that are find the instrument and submit the work into it. Necessary TBB tasks are created automatically, which are processed by master thread or TBB worker threads. The operation is blocking, meaning that the master thread will not return from the call until all the work is done.

Asynchronous interface of tbb::flow::async_node assumes that the user provides asynchronous entity to which the data from async_node's body is submitted. Whenever the asynchronous processing of the submitted data is complete, the asynchronous activity/entity could use the gateway interface, which is provided by the async_node, to submit the message to successors of that async_node.

The purpose of async_node is not to block TBB workers and master threads on blocking API, such as input/output, but allow them processing other TBB tasks while the actual blocking is done in user provided asynchronous activity.

If you still want to use flow graph for you purposes, then from the description you provided I would build the following graph:

Where source_node checks the data existence and sends it to multifunction_node, where the decision about the instrument is made and the data gets submitted to the multifunction output port corresponding to the chosen instrument.

Hope this helps,

Hello Aleksei

I tried queueing policy but it is still synchronous. This is my code:

 string& symb = depthData.symbol;

        auto it = instrmap->find(symb);
        if (it != instrmap->end()) {
                Instrument* inst = it->second;


                if (depthData.side == Side::Buy) {
                        bool b = (inst->getBidworker())->try_put(depthData);
                } else if (depthData.side == Side::Sell) {
                        bool b = (inst->getAskworker())->try_put(depthData);

These are the async_nodes returned by getBidworker() and getAskworker():

tbb::flow::async_node<DepthData, tbb::flow::continue_msg ,queueing>* bidworker;
tbb::flow::async_node<DepthData, tbb::flow::continue_msg, queueing>* askworker;


What do i miss here?


Hello cinar e.,

What does the body of async_node do? Does it push the signal to asynchronous thread so that it starts processing? What else it does? It would be good if you provide a reproducer of the problem you see as well.

Also, please indicate by which thread the code you have shown is executed. Is this thread came from TBB library or it is you master thread?

I am asking because I believe there is misunderstanding between us in what asynchronous means here. 

Regards, Aleksei

Hello Aleksei,


After your latest comment, i rechecked the logs; work done inside the async_node job is another thread so it should be running asynchronously.

sorry for my mistake. So do you have any objections for my async_node usage here? 

Inside the functor of async_node, some processing is done and graph is finished there; gateway_type passed is not used. 


Hi cinar e.,

Glad you was able to root cause the issue!

I do not have any objections. As long as the work from asynchronous thread is not put back into the graph, usage of gateway_type can be omitted.

Regards, Aleksei.


Hello Aleksei,

I need your help on a performance issue. As i told before, i use the async_node like :

 bool b = (inst->getBidworker())->try_put(depthData);


I have comments before and after this line and i see that this submit takes around 150-300 microseconds. Is there a way to decrease this time with some tbb calls?


"I have comments before and after this line and i see that this submit takes around 150-300 microseconds."

comments-->logs :)

Hello cinar e.,

If you are using async_node without lightweight policy (as you remember lightweight policy is the default for async_node), then I would expect that the time is mostly spent on data copying. However, looking at the flow graph code I do not see any data copying involved. The data seems to be passed by reference all the time. Can you see if your data gets copied anywhere on the interested execution path? Any cache misses involved along the way?

The other source of this might be the virtual call and indirect access, but I don't think that it could cost that much - 150-300 microseconds is huge amount of time for a couple of indirects. Have you profiled your application using Intel(R) VTune (TM) Amplifier?

Hope this helps,



data passed is a struct which contains several int's and a string_view. Copying shouldnt be a problem.

By the way, logging is async but anyway it is still high.

Sometimes, time cost drops down to 30 microsecond. (when two consecutive market data belongs to the same instrument so the same asnyc_node is triggered) So this might be a context switch issue, maybe?? 

I didnt profile the application; i will check.

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya