Implementing task_group interface in TBB

By Arch Robison (Intel) (19 posts) on July 2, 2008 at 5:53 am

The TBB class task was designed for high-performance implementations of the TBB templates.  It's efficiency, particularly its emphasis on continuation-passing style, comes at some price in convenience.  Rick Molloy of Microsoft has posted a description of a task_group interface that Microsoft is considering.  It's more convenient for than the TBB interface, particularly when your compiler supports C++ 200x lambda expessions (Section 5.1.1 of N2606).

I implemented a subset of task_group in TBB as a header tbb/task_group.h: 37 lines of C++ and 5 preprocessor lines.   It's a small subset.

But nonetheless, I think some TBB users will find this minimal form useful.  For example, it's enough of task_group to write the quicksort in Molloy's post.

The code for header follows my signature.  I'd be interested to hear how useful it is.

- Arch

#ifndef __TBB_task_group_H
#define __TBB_task_group_H

#include "tbb/task.h"

namespace tbb {

class task_group;

namespace internal {

// Suppress gratuitous warnings from icc 11.0 when lambda expressions are used in instances of function_task.
#pragma warning(disable: 588)

template<typename Function>
class function_task: public task {
    Function my_func;
    /*override*/ task* execute() {
        my_func();
        return NULL;
    }
public:
    function_task( Function& f ) : my_func(f) {}
};

} // namespace internal

class task_group: internal::no_copy {
private:
    empty_task* root;
public:
    task_group() {
        root = new(task::allocate_root()) empty_task;
        root->set_ref_count(1);
    }
    ~task_group() {
        if( root->ref_count() )
            root->wait_for_all();
        root->destroy(*root);
    }
    template<typename Function>
    void run( Function f ) {
        task& self = task::self();
        self.spawn(*new( self.allocate_additional_child_of( *root )) internal::function_task<Function>(f) );
    }
    void wait() {
        root->wait_for_all();
    }
};

} // namespace tbb

#endif /* __TBB_task_group_H */

 

Categories: Multi-Core, Threading Building Blocks

Comments (4)

July 2, 2008 2:51 PM PDT

Arch Robison (Intel)
I edited the code in my post to address two issues:
<OL>
<LI>The blog software ate characters such as <, >, and &
<LI>Molloy's post is silent on whether method run() is thread safe. That is, can two threads call "run" on the same task_group? The revised implementation above does permit such.
</OL>
The following example, which uses lambdas, possibly calls run() from different threads on the same task group. It prints 1000000.

<pre>
#include "tbb/task_group.h"
#include "tbb/atomic.h"
#include "tbb/task_scheduler_init.h"
#include <stdlib.h>
#include <stdio.h>

using namespace tbb;

atomic<int> Counter;

const int N = 1000;

int main( int argc, char* argv[] ) {
task_scheduler_init init(argc>1 ? strtol(argv[1],0,0) : task_scheduler_init::automatic );
task_group g;
for( int i=0; i<N; ++i )
g.run([&,i]{
for( int j=0; j<N; ++j ) {
g.run([&,j]{
Counter++;
});
}
});
g.wait();
printf("Counter=%d\n",int(Counter));
return 0;
}
</pre>

The example is just for show. In generally, it is non-scalable to create many tasks from the same task_group, because creation and completion of a task involve bumping a reference counter, whose cache line becomes a point of contention. Use recursive task creation for scalability, like a nuclear chain reaction.
July 2, 2008 11:59 PM PDT

Andrey Marochko (Intel)
Cool work, Arch! Shows both how easy some of the MS concepts (which took them years to arrive to) can be impemented, and how inefficient the implementation can be (beware of the closed sources) :).

I think that to provide MS like exception handling behavior you need to explicitly create an isolated context for each task_group and associate it with the root task.

I'm also not quite sure that the check in the task_group destructor for the root's refcount being nonzero helps in any way. Actually it may become zero only if the user used task::self() inside its functor, took its parent, and created a child for it not using allocate_additional_child_of(). And in this case our empty root will be executed by the scheduler and destroyed, and the check itself will probably cause access violation. I think the best thing you could do here is to call wait_for_all unconditionally, and rely on assertions inside TBB to warn about misuses.
July 3, 2008 6:31 AM PDT

Arch Robison (Intel)
I agree that it sppears that each task_group will likely require a task_group_context to implement the Microsoft's exception-handling semantics. As Microsoft makes more details public (such as exception-handling semantics), I'll update my TBB version to match as closely as practical.

The test on root->ref_count() is intended to protect against cases where the destructor of a task_group is called before wait() is called; e.g., out of forgetfullness or because an exception was thrown. The check is necessary because calling wait_for_all unconditionally does not work if it has already been called (by task_group::wait). The reason is that root->wait_for_all() waits until root->ref_count() becomes 1, and then sets root->ref_count() to 0. So calling wait_for_all twice, without resetting the ref_count, is an error. The debug version of TBB has assertions that diagnose this error.
July 6, 2008 11:57 AM PDT


sm345
Good stuff. The code shows how easy it is to use for nullary functions. It would also be useful to see an implementation which spawns off non-nullary functions.
For a function int my_func(int i, int j) {return i + j;}
users would love to just say g.run(3,4) or g(3,4) or more completely
int result = g(3,4)
Of course all non-nullary functions can be bind -ed to yield nullary functions, but the users should not have to do that.

I am liking TBB more and more. Thanks for making it available.

Trackbacks (0)


Leave a comment  

To obtain technical support, please go to Software Support.
Name (required)*

Email (required; will not be displayed on this page)*

Your URL (optional)


Comment*