Using the Intel® Threading Building Blocks Graph Community Preview Feature: Creating a Simple Message Graph.

Intel® Threading Building Blocks (Intel® TBB) Version 3 Update 5 introduced the class `graph` as a Community Preview (CP) feature. There is an introductory post that provides an overview of the class and the nodes that can be used with it. You can download the open-source version of this release at www.threadingbuildingblocks.org and are encouraged to provide feedback about the graph via the forum. In a previous post, I provided an example that created a dependency graph. In this post, I describe how to make simple message graph using class `graph`.

This example will calculate the sum of x*x + x*x*x for all x = 1 to 10. This is a simple syntactic example only. Since each node in a graph may execute as an independent task, the granularity of each node should follow the general guidelines for tasks as described in Section 3.2.3 of the Intel® Threading Building Blocks Tutorial. But for demonstration purposes, I will use an artificial, tiny example here and inflate the time spent in each node by sleeping for 1 second after each operation.  I use the Linux function `sleep` for this. If you want to enter this example yourself, you'll have to use the appropriate sleep function for your system.

The basic layout of the graph that I’ll create is shown in the figure below. Each value enters through the `input` node. This node will broadcast the value to both `squarer` and `cuber`, which will calculate x*x and x*x*x respectively (and sleep for 1 second). The output of each of these nodes will be placed in an unbounded buffer. A tuple containing both values will be created by the `join` node and forwarded to `summer`, which will add both values to the running total (and sleep for 1 second). The `squarer` and `cuber` will allow unlimited concurreny, that is they will be allowed to process multiple values simultaneously. The final `summer`, which updates a shared total, will be only allowed process a single in-coming tuple at a time, eliminating the need for a lock around the shared value.

As with all Community Preview Features, the graph must be explicitly enabled. This is done by defining its macro, `TBB_PREVIEW_GRAPH`, before including the header file as shown below.

`#define TBB_PREVIEW_GRAPH 1`
`#include "tbb/graph.h"`

This example performs three basic types of operations: square, cube and sum. The classes below define these operations and will be use as the body objects for my `function_node`s.

`struct square { `
`  int operator()(int v) {`
`    printf(“squaring %d\n”, v);`
`    sleep(1); `
`    return v*v; `
`  }`
`};`

`struct cube {`
`  int operator()(int v) {`
`    printf(“cubing %d\n”, v);`
`    sleep(1); `
`    return v*v*v; `
`  }`
`};`

`class sum {`
`  int &my_sum;`
`public:`
`  sum( int &s ) : my_sum(s) {}`
`  int operator()( std::tuple<int,int> v ) {`
`    printf(“adding %d and %d to %d\n”, std::get<0>(v), std::get<1>(v), my_sum);`
`    my_sum += std::get<0>(v) + std::get<1>(v);`
`    return my_sum;`
`  }`
`};`

In function `main`, the graph is setup and then the values 1 – 10 are put into the `input` node. All the nodes in this example pass around values of type `int`. The nodes used below are all class templates and therefore can be used with any type that supports copy construction, including pointers and objects. It should be noted that values are copied as they pass between nodes, so passing around large objects should be avoided.

`using namespace tbb; `

`int main() {`
`  int result = 0;`

`  graph g;`
`  broadcast_node<int> input;`
`  function_node<int,int> squarer( g, graph::unlimited, square() );`
`  buffer_node<int> square_buffer(g);`
`  function_node<int,int> cuber( g, graph::unlimited, cube() );`
`  buffer_node<int> cube_buffer(g);`
`  join_node<int,int> j( g );`
`  function_node<std::tuple<int,int>,int> summer( g, graph::serial, sum(result) );`

`  make_edge( input, squarer );`
`  make_edge( input, cuber );`
`  make_edge( squarer, square_buffer );`
`  make_edge( square_buffer, std::get<0>( j.inputs() ) );`
`  make_edge( cuber, cube_buffer );`
`  make_edge( cube_buffer, std::get<1>( j.inputs() ) );`
`  make_edge( j, summer );`

`  for (int i = 1; i <= 10; ++i)`
`    input.try_put(i);`
`  g.wait_for_all();`
`  printf("Final result is %d\n", result);`
`  return 0;`
`}`

Towards the top of the code above, the graph nodes are created: 1 `broadcast_node`, 3 `function_node` objects , 2 `buffer_node` objects, and 1 `join_node`. After the nodes are created, they are linked together using `make_edge` calls. Both the nodes and edges correspond directly to the figure presented earlier.

Once the graph has been setup, the values 1-10 are put into the `input` node. The `wait_for_all` call will block until there is no more activity in the graph. The result of the computation is then printed to `stdout` and should look something like:

`cubing 10 `
`squaring 1`
`cubing 1`
`squaring 2`
`cubing 2`
`cubing 3`
`squaring 4`
`squaring 3`
`squaring 5`
`cubing 5`
`cubing 4`
`squaring 6`
`cubing 6`
`squaring 7`
`cubing 7`
`squaring 8`
`squaring 10`
`adding 1 and 1000 to 0`
`cubing 8`
`squaring 9`
`cubing 9`
`adding 4 and 1 to 1001`
`adding 9 and 8 to 1006`
`adding 25 and 27 to 1023`
`adding 36 and 125 to 1075`
`adding 49 and 64 to 1236`
`adding 64 and 216 to 1349`
`adding 16 and 343 to 1629`
`adding 100 and 512 to 1988`
`adding 81 and 729 to 2600`
`Final result is 3410`

One might note that most of the square and cube operations happened before any of the sum operations in the above run. This is an artifact of using a for loop to inject the values 1-10 into the graph. In a future example, I'll demonstrate how a `source_node` can be used to create a better execution ordering.

For more complete information about compiler optimizations, see our Optimization Notice.