# Implementing a wave-front computation using the Intel® Threading Building Blocks flow graph

By Michael V. (Intel), published on September 9, 2011

The Intel® Threading Building Blocks ( Intel® TBB ) flow graph is fully supported in Intel® TBB 4.0. If you are unfamiliar with the flow graph, you can read an introduction here.

One node type available for use with the flow graph is `continue_node<T>`

. This node type is designed for implementing dependency graphs, where nodes wait for their predecessors to complete before beginning their own work. A `continue_node`

does not receive data messages from its predecessors, but instead counts the number of `continue_msg`

signals that it receives. Once it receives P messages, one for each predecessor, it executes its body which generates an output message of type `T`

. Often, the output type is also a `continue_msg`

but it need not be.

Pictorially, we draw a `continue_node`

as below:

This symbol tries to convey important properties about a `continue_node`

. The input arc has lines above it to indicate that it counts incoming messages. The interior of the circle contains `f()`

to indicate that the body is a functor that is passed no argument.

Figure 1 shows an approach to implementing a wave front computation using a set of `continue_node`

objects. In this example, each computation must wait for the computation above it and the computation to its left to complete before it can start executing. Most nodes have two predecessors and therefore will not start executing until they receive two `continue_msg`

messages. Nodes on the top and left edges have only a single predecessor and therefore wait for only a single message to arrive.**Figure 1: Using an Intel® Threading Building Blocks flow graph to express a wave-front calculation**

I'll now provide the complete code necessary to implement an example that performs such a computation using an Intel® TBB flow graph. In this example, the computation at each node will update a block of a 2 dimensional matrix. If the values held by an element's left and upper neighbors are equal, then the element's value will be set to be 2 times that value. Otherwise, the element's value will be set to be the maximum of the two values. The top-left element is initialize with a value of 1. So for a 5x5 matrix, the results would be:`1 1 1 1 1`

`1 2 2 2 2`

`1 2 4 4 4`

`1 2 4 8 8`

`1 2 4 8 16`

The code below includes the necessary headers, defines some parameters, and defines the function `calc`

that performs the calculation on each matrix element. The constants `M`

and `N`

define the size of the matrix. The dimension of the blocks computed in each node is given by the blocksize, `B`

, and the number of blocks in each dimenstion is computed and stored in `MB`

and `NB`

. The 2-D matrix, `values`

, is where the results will be stored.`#include <algorithm> // for std::max`

`#include <cstdio>`

`#include "tbb/flow_graph.h" `

`using namespace tbb;`

`using namespace tbb::flow;`

`int M=1000, N=1000;`

`int B = 100;`

`int MB = (M/B) + (M%B>0);`

`int NB = (M/B) + (M%B>0);`

`double **value;`

`inline double calc( double v0, double v1 ) {`

` if ( v0 == v1 )`

` return 2*v0;`

` else`

` return std::max(v0,v1);`

`}`

The code below builds the flow graph that will apply the function `calc`

to the matrix blocks, while respecting the dependencies shown in Figure 1. The 2-D array, `node`

, is used to hold pointers to the `continue_node`

objects. In `BuildGraph`

, the doubly-nested for loop allocates the `continue_node`

objects. Each node is constructed with a reference to the graph object g and lambda expression that applies `calc`

to its corresponding block of elements. After each node is created, edges are made from it to its successors in the graph, setting up the required dependencies. Note that the loop indices move from the bottom right of Figure 1 to the top left, so each node's successors are allocated before it.`continue_node<continue_msg> ***node;`

`void BuildGraph( graph &g ) {`

` value[M-1][N-1] = 0;`

` for( int i=MB; --i>=0; )`

` for( int j=NB; --j>=0; ) {`

` node[i][j] =`

` new continue_node<continue_msg>( g,`

` [=]( const continue_msg& ) {`

` int start_i = i*B;`

` int end_i = (i*B+B > M) ? M : i*B+B;`

` int start_j = j*B;`

` int end_j = (j*B+B > N) ? N : j*B+B;`

` for ( int ii = start_i; ii < end_i; ++ii ) {`

` for ( int jj = start_j; jj < end_j; ++jj ) {`

` double v0 = ii == 0 ? 0 : value[ii-1][jj];`

` double v1 = jj == 0 ? 0 : value[ii][jj-1];`

` value[ii][jj] = ii==0 && jj==0 ? 1 : calc(v0,v1);`

` }`

` }`

` } );`

` if ( i + 1 < MB ) make_edge( *node[i][j], *node[i+1][j] );`

` if ( j + 1 < NB ) make_edge( *node[i][j], *node[i][j+1] );`

` }`

`}`

The function `EvaluateGraph`

executes the flow graph. It does this by putting a `continue_msg`

to the top-left element, and then waiting for the activity in the graph to stop. When the call to `g.wait_for_all()`

returns, all of the nodes have been evaluated and the final result produced.`double EvaluateGraph( graph &g ) {`

` node[0][0]->try_put(continue_msg());`

` g.wait_for_all();`

` return value[M-1][N-1];`

`}`

Since we create a matrix of `continue_node`

objects, we also have to delete them:`void CleanupGraph() {`

` for( int i=0; i<MB; ++i )`

` for( int j=0; j<NB; ++j )`

` delete node[i][j];`

`}`

Finally, the `main`

function shown below invokes these functions to build, evaluate and clean up the flow graph.`int main(int argc, char *argv[]) {`

` value = new double *[M];`

` for ( int i = 0; i < M; ++i ) value[i] = new double [N];`

` node = new continue_node<continue_msg> **[MB];`

` for ( int i = 0; i < MB; ++i ) node[i] = new continue_node<continue_msg> *[NB];`

` graph g;`

` BuildGraph(g);`

` double result = EvaluateGraph(g);`

` CleanupGraph();`

` printf("%g\n", result);`

` for ( int i = 0; i < M; ++i ) delete [] value[i];`

` for ( int i = 0; i < MB; ++i ) delete [] node[i];`

` delete [] value;`

` delete [] node;`

` return 0;`

`}`

I hope that this example demonstrates that a flow graph can be used to easily express a depedency graph. The basic steps are (1) create a set of `continue_node`

objects, (2) connect these nodes together using calls to `make_edge`

, and (3) start the execution by sending a `continue_msg`

to any nodes that do not have predecessors.

If you are interested in learning more about the Intel® Threading Building Blocks ( Intel® TBB ) flow graph, please check out the other blog articles at /en-us/blogs/tag/flow_graph or visit www.threadingbuildingblocks.org.

## 2 comments

TopAnonymous said on Jan 9,2012

Great post of flow graph !

In the second paragraph "Once it receives P messages...", does 'P' mean number of its predecessor nodes ?

Thanks.

Anonymous said on Sep 10,2011

good post....

## Add a Comment

Sign inHave a technical question? Visit our forums. Have site or software product issues? Contact support.