Using the Intel® Threading Building Blocks Graph Community Preview Feature: An Implementation of Dining Philosophers.

Intel® Threading Building Blocks (Intel® TBB) Version 3 Update 5 introduced the class graph as a Community Preview (CP) feature. There is an introductory post that provides an overview of the class and the nodes that can be used with it. You can download the open-source version of this release at and are encouraged to provide feedback about the graph via the forum. In a previous post, I provided an example that created a simple message graph.  In this post, I describe a more complicated example that highlights some interesting features of the API.

This example will demonstrate:

    • How to use the graph's run function.

    • How to mix explicit puts with explicit edges

    • The non-greedy nature of the join_node

In this post, I'll provide an implementation for the Dining Philosophers problem shown below.  In this problem, several philosophers are sitting together at a table.  Each philosopher needs to both think and eat, but can only do one of these at a time.  They each think, eat, think, eat, etc...  In the figure below, the philosophers are using chopsticks to eat noodles.  They must grab both the chopstick to their left and the chopstick to their right before eating. To complicate things, the chopsticks are shared with their neighbors.  So a philosopher's left chopstick is their left neighbor's right chopstick.  And their right chopstick is their right neighbor's left chopstick.  


Dining Philosophers is a challenging problem because it will deadlock without proper cooperation between the philosophers.  For example, if all of the philosophers start by grabbing their left chopstick, then there will be no right chopstick available for any of them.  None of them will be able to eat (and subsequently think) unless their right neighbor gives up the chopstick they have already claimed.  There are a number of existing solutions to the Dining Philosophers problem.

I'll use the tbb::graph and its associated node classes to implement a solution to Dining Philosophers.  In my solution, each philosopher will be an object that contains a join_node that will capture the chopsticks and a function_node that will perform the eating and thinking. The chopsticks will be null objects and their places on the table will be implemented as queue_nodes. If a queue_node has an item, it means that the chopstick is available at that place, otherwise it is not available. At most each queue_node will contain one item.  The graph for 4 philosophers will therefore be structured like the figure below.

As with all Community Preview Features, the graph must be explicitly enabled. This is done by defining its macro, TBB_PREVIEW_GRAPH, before including the header file as shown below.

#include "tbb/graph.h"

The main function is shown below. 

const char *names[] =
{ "Archimedes", "Aristotle", "Democritus", "Epicurus", "Euclid",
"Heraclitus", "Plato", "Pythagoras", "Socrates", "Thales" };

int main(int argc, char *argv[]) {
  int num_threads = 0;
  int num_philosophers = 10;
  if ( argc > 1 ) num_threads = atoi(argv[1]);
  if ( argc > 2 ) num_philosophers = atoi(argv[2]);

  if ( num_threads < 1 || num_philosophers < 1 || num_philosophers > 10 ) exit(1);

  tbb::task_scheduler_init init(num_threads);
  tbb::graph g;
  printf("\n%d philosophers with %d threads\n\n",
         num_philosophers, num_threads);

  std::vector< tbb::queue_node * > places;
  for ( int i = 0; i < num_philosophers; ++i ) {
    tbb::queue_node<chopstick> *qn_ptr = new tbb::queue_node(g);
    places.push_back( qn_ptr );

  std::vector< philosopher > philosophers;
  for ( int i = 0; i < num_philosophers; ++i ) {
    philosophers.push_back( philosopher( names[i], g,
                                         places[(i+1)%num_philosophers] ) ); philosophers[i] );

  for ( int i = 0; i < num_philosophers; ++i ) philosophers[i].check();

  return 0;

After the initial command line processing is done in the main function above, a graph object is instantiated. A vector places of queue_node<chopstick> pointers is then populated with queues that will represent the places at the table.

After each queue_node is created, a single chopstick object is put to it, indicating that a chopstick is initially available at that location.

After the queues are created, the main function then populates a vector of philosopher objects. After each philosopher is added to the vector, it is passed to the graph object's run function. As I will show shortly, class philosopher not only contains a function_node and join_node but it is also a function object, defining a void operator()(). The graph's run function executes this function object in a task that is a child of the graph's root task. No calls to g.wait_for_all() will return until all tasks that are children of this root task complete.  The philosophers use their operator() functions to think once and then insert themselves in to the graph. The main function ends by checking each philosopher object to verify that it has called think and eat the proper number of times.

There is also version of run that takes a second argument: template<typename Receiver, typename Body> void run( Receiver &r, Body body ). Like the version used in this example, it creates a task that runs body but also sends the value returned by body to the receiver r.

My declaration of class philosopher is shown below:

const int think_time = 1;
const int eat_time = 1;
const int num_times = 10;

class chopstick {};

class philosopher {

  typedef tbb::queue_node< chopstick > chopstick_buffer;
  typedef tbb::join_node< chopstick, chopstick > join_type;

  philosopher( const char *name, tbb::graph &the_graph,
               chopstick_buffer *left, chopstick_buffer *right ) :
   my_name(name), my_graph(&the_graph),
   my_left_chopstick(left), my_right_chopstick(right),
   my_join(new join_type(the_graph)), my_function_node(NULL),
   my_count(new int(num_times)) {}

  void operator()();
  void check();


  const char *my_name;
  tbb::graph *my_graph;
  chopstick_buffer *my_left_chopstick;
  chopstick_buffer *my_right_chopstick;
  join_type *my_join;
  tbb::function_node< join_type::output_type, tbb::continue_msg > *my_function_node;
  int *my_count;

  friend class node_body;

  void eat_and_think( );
  void eat( );
  void think( );
  void make_my_node();


Each philosopher has a const char *my_name that holds its name. It also has pointers to the graph, the two chopstick queues that it is seated near, its join_node, its function_node and the counter that it will use to track how many times its been called.

Let's first look at the definition of void operator()(), which is invoked by the tasks enqueued by calls to run in main. This function calls think and then make_my_node.  So each philosopher will first think and then afterwards insert itself into the graph.

void philosopher::operator()() {

Both function think and function eat (which will be used later) are straightforward functions that just sleep:

void philosopher::think() {
  printf("%s thinking\n", my_name );
  printf("%s done thinking\n", my_name );

void philosopher::eat() {
  printf("%s eating\n", my_name );
  printf("%s done eating\n", my_name );

The function make_my_node is responsible for creating the function_node and connecting both the join_node and function_node to the rest of the graph. The join_node's input ports are stored in a std::tuple, which is returned by the call to inputs(). I use the template function std::get to access the needed element. The implementation of make_my_node is shown below:

void philosopher::make_my_node() {
  my_left_chopstick->register_successor( std::get<0>(my_join->inputs()) );
  my_right_chopstick->register_successor( std::get<1>(my_join->inputs()) );
  my_function_node =
    new tbb::function_node< join_type::output_type, tbb::continue_msg >( *my_graph,
      tbb::graph::serial, node_body( *this ) );
  tbb::make_edge( *my_join, *my_function_node );

The class node_body is a straightforward function object that invokes the corresponding philosopher's eat_and_think function.

class node_body {
  philosopher &my_philosopher;
  node_body( philosopher &p ) : my_philosopher(p) { }
  void operator()( philosopher::join_type::output_type ) {

The implementation of eat_and_think(), calls the philosopher's function eat and then decrements its count. If the philosopher stills needs to eat and think more, then it puts its chopsticks back down on the table and thinks. Otherwise, it removes its join_node from the graph before putting its chopsticks back down on the table.

void philosopher::eat_and_think( ) {

  if (*my_count > 0) {
    my_left_chopstick->try_put( chopstick() );
    my_right_chopstick->try_put( chopstick() );
  } else {
    my_left_chopstick->remove_successor( std::get<0>(my_join->inputs()) );
    my_right_chopstick->remove_successor( std::get<1>(my_join->inputs()) );
    my_left_chopstick->try_put( chopstick() );
    my_right_chopstick->try_put( chopstick() );

The code above demonstrates that nodes can be connected by explicit edges, as is the case for the queue_nodes and the join_node. And user code can also do explicit try_puts to nodes. In this example, there is no explicit edge from the philosopher back to its chopstick queues. However, eat_and_think explicitly calls try_put to put chopstick objects in to the queues.

Finally at the end of main, each philosopher's function check is called to verify that it has been executed the correct number of times (and it also does some cleanup).

void philosopher::check() {
  if ( *my_count != 0 ) {
    printf("ERROR: philosopher %s still had to run %d more times\n", my_name, *my_count);
  } else {
    printf("%s done.\n", my_name);
  delete my_function_node;
  delete my_join;
  delete my_count;

When I execute this example using four philosophers and a single thread, "philosophers 1 4", it runs in about 80 seconds. This is 4 x ( 10 thinks + 10 eats ) = 80. When I run it using all 8 threads available on my desktop, it completes in about 21 seconds.

The reason this example works at all is because of the non-greedy nature of the join_node. A join_node creates a std::tuple from the items it receives at its input ports. However, it does not greedily consume items as they appear. Instead, once it has received notification that an item is available at each port it then attempts to reserve each of these items. If it is successful, only then does it create the tuple and consume the items. If it cannot reserve an item at any one port, it releases all reservations it has previously made.

In the Dining Philosopher's problem, the join_node prevents deadlock by never holding a chopstick unless it can acquire both. It may reserve one of the chopsticks, but if it cannot reserve the other, it puts the first one back on the table and tries again.

For more complete information about compiler optimizations, see our Optimization Notice.


anonymous's picture

Nice to see another way of implementing TDPP.

If you're interested, here iis an implementation using the actor model in Akka:

Michael V. (Intel)'s picture

Thanks Steven and my apologies to Democratus… In fact, after searching I see that his name is most commonly spelled Democritus, which means I was wrong on both the “a” and the “i”. I’ve changed it in the code above.

I haven’t implemented a library for arbitrary-precision integers, so I can’t comment on that directly. But I can comment on the applicability of TBB and tbb::graph to multi-core and multiprocessing in general. TBB is currently targeted solely at shared-memory systems. It has a number of features that make writing parallel programs for multi-core systems easier, including parallel algorithms and concurrent data structures. A colleague and I recently published a paper in the Jan/Feb 2011 issue of IEEE Software that describes some of these features in more detail. If you want to get a basic idea of what TBB is all about and how it enables parallel programming, it's a good place to start.

anonymous's picture

Regarding the philosopher names: Democratus is spelled with a "u". Check your "naming conventions!"

I am implementing a "pthreads" solution to arbitrary-precision integers, if interested. I am interested in your threading solution. How does your solution enable easy access to multi-core and multiprocessing solutions, in a closely coupled architecture?

Steven Hines
MSCS Student,

Michael V. (Intel)'s picture

Thanks yinongchen and good luck using TBB for your problem. If you get some interesting results, we’d like to hear about it in the TBB forum. And yes, the traditional Dining Philosophers Problem has 5 philosophers, but I couldn’t resist the symmetry that made drawing the second figure so much easier :)

yinong-chen's picture

This is a good example of using TBB!
I have implemented a producer-consumer problem with multiple producers and consumers share a single queue. I will try to use TBB to speed up the execution.
By the way, The Dining Philosophers problem is also called Five Dining Philosophers Problem. I know your program works for any number of philosophers, but you may want to draw five philosophers in your figures. :-)

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.