Ordering issues

This post continues a series of articles that describes the opencl_node, a new node available in the Intel® Threading Building Blocks (Intel® TBB) library since version 4.4 Update 2. This node allows OpenCLTM powered devices to be more easily utilized and coordinated by an Intel TBB flow graph. The first article in this series can be found here.

In the previous blog article in this series, I described how to specify the OpenCL program, select a kernel from the program and how to bind arguments for an invocation. In this posting I discuss how to use type-specified message keys to avoid ordering issues.

Ordering issues

The following example creates two identical function_nodes that create and fill buffers with the same values. The following opencl_node accepts pairs of buffers and multiplies them.

#define TBB_PREVIEW_FLOW_GRAPH_NODES 1
#include "tbb/flow_graph_opencl_node.h"

#include <cmath>
#include <stdexcept>
#include <string>

int main() {
    try {
        using namespace tbb::flow;
        typedef opencl_buffer<cl_int> CLBuf;
        const int N = 10;
        
        opencl_graph g;
        opencl_node<tuple<CLBuf,CLBuf>> clMul( g, "mul.cl", "mul" );
        clMul.set_ndranges( { N } );    
        
        function_node<int,CLBuf> filler0( g, unlimited, [&g,N]( int i ) -> CLBuf {
            CLBuf b(g, N);
            std::fill( b.begin(), b.end(), i );
            return b;
        } );
    
        function_node<int,CLBuf> filler1 = filler0;
        
        function_node<CLBuf> checker( g, serial, []( const CLBuf &b ) {
            for ( cl_int v : b ) {
                int r = int(std::sqrt(v) + .5);
                if ( r*r != v )
                    throw std::runtime_error( std::to_string(v) + " is not a square of any integer number" );
            }
        } );
    
        make_edge( filler0, input_port<0>(clMul) );
        make_edge( filler1, input_port<1>(clMul) );
        make_edge( output_port<0>(clMul), checker );
        
        for ( int i = 0; i<1000; ++i ) {
            filler0.try_put( i );
            filler1.try_put( i );
        }     
        g.wait_for_all();
    } catch ( std::exception &e ) {
        std::cerr << "An exception has occurred: " << e.what() << std::endl;
    }
    return 0;
}
mul.cl:
kernel void mul( global int* b1, global int* b2 ) {
    const int index = get_global_id(0);
    b1[index] *= b2 [index];
}

The checker node expects that the opencl_node gets buffers with the values and multiplication produces the squares of the numbers. However, the example may fail:

An exception has occurred: 54522 is not a square of any integer number

The main issue is that function nodes run at the same time and send messages in non-deterministic order. Therefore, opencl_node may multiply buffers with different values. To address this issue, opencl_node supports the type-specified key matching feature that was introduced in Intel TBB 4.4 Update 2.

To enable key matching in opencl_node, the key_matching<Key> policy should be provided at construction time as a template argument:

opencl_node<tuple<CLBuf,CLBuf>, key_matching<int>> clMul(g, "mul.cl", "mul");

In addition, message types should meet the type-specified message key concept (see the “Type-specified message keys for join_node” section in the Intel TBB documentation). For that we extend the opencl_buffer<cl_int> class with the "int key() const" method:

class CLBuf : public opencl_buffer<cl_int> {
    int my_key;
public:
    CLBuf() {}
    CLBuf( opencl_graph &g, size_t N, int k ) 
        : opencl_buffer<cl_int>(g, N), my_key(k) {}
    int key() const { return my_key; }
};

This method is called by opencl_node to match incoming messages correctly. The node waits for messages with the same key value to be passed to all input ports.

Other parts of the initial example remains unchanged:

#define TBB_PREVIEW_FLOW_GRAPH_NODES 1
#include "tbb/flow_graph_opencl_node.h"

#include <cmath>
#include <stdexcept>
#include <string>

using namespace tbb::flow;

class CLBuf : public opencl_buffer<cl_int> {
    int my_key;
public:
    CLBuf() {}
    CLBuf( opencl_graph &g, size_t N, int k ) : opencl_buffer<cl_int>(g, N), my_key(k) {}
    int key() const { return my_key; }
};

int main() {
    try {
        using namespace tbb::flow;
        const int N = 10;
        
        opencl_graph g;
        opencl_node<tuple<CLBuf,CLBuf>, key_matching<int>> clMul( g, "mul.cl", "mul" );
        clMul.set_ndranges( { N } );    
        
        function_node<int,CLBuf> filler0( g, unlimited, [&g,N]( int i ) -> CLBuf {
            CLBuf b(g, N, i); // the last argument is the key value
            std::fill( b.begin(), b.end(), i );
            return b;
        } );
    
        function_node<int,CLBuf> filler1 = filler0;
        
        function_node<CLBuf> checker( g, serial, []( const CLBuf &b ) {
            for ( cl_int v : b ) {
                int r = int(std::sqrt(v) + .5);
                if ( r*r != v )
                    throw std::runtime_error( std::to_string(v) + " is not a square of any integer number" );
            }
        } );
    
        make_edge( filler0, input_port<0>(clMul) );
        make_edge( filler1, input_port<1>(clMul) );
        make_edge( output_port<0>(clMul), checker );
        
        for ( int i = 0; i<1000; ++i ) {
            filler0.try_put( i );
            filler1.try_put( i );
        }     
        g.wait_for_all();
    } catch ( std::exception &e ) {
        std::cerr << "An exception has occurred: " << e.what() << std::endl;
    }
    return 0;
}

Now the example works as expected.

You should also pay attention to the following:

  • If the key matching policy is specified for opencl_node then all its message types should support the "type-specified message keys";
  • If the type is a structure and is not inherited from opencl_buffer, it will be passed to the OpenCL kernel as is. Therefore, the kernel should properly declare its function parameters.

This blog article is the last one in a series of blogs that describe the opencl_node. The first blog in this series can be found here.

For more complete information about compiler optimizations, see our Optimization Notice.