Here's a interesting mathematical question: Is there a way to define a range object that lets us write a parallel_for that iterates over permutations of a set of N objects? Better yet, can it be done so that subranges are always in lexicographic order?

Here's how it might be used. The parallel_for invocation might look something like:

parallel_for( perm_range(N), Body() );

and Body::operator() might look something like:

void Body::operator()( const perm_range& r ) {
permutation p = r.begin();
while( p!=r.end() ) {
...process the permutation...
...get the next permutation...
std::next_permutation( p.begin(), p.end() );

For example, for N=3, we would want the permutations:

0 1 2
0 2 1
1 0 2
1 2 0
2 0 1
2 1 0

to be generated. The hard part is figuring out how to divide a permutation range into two roughly equal subranges.

- Arch

P.S. TBB does support recursive parallelism, so it's straightforward to write a recursive program that does permutations in parallel, using a parallel_for at each recursion level. I can post that code if anyone cares.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

The lexicographic ranking and unranking of permutations might be useful for this problem, but could be quite costly. They perform a 1-on-1 mapping between a permutation of integers {0,1,...,(n-1)} and an integer within [0,n!). The algorithms can be found in the book "Combinatorial algorithms" by Donald L. Kreher and Douglas R. Stinson. If you're interested, I can post my implementation here.

I came up with a reasonably straightforward and efficient solution a while back, but didn't have time to post it until now. See code below. It still has printfs in it that are inappropriate for parallel execution, but I left them there to show where the permutations appear.

- Arch

// permute.cpp : Defines the entry point for the console application.
// Program that shows how to write a TBB-style recursive range for permutations.
// Author: Arch D. Robison
// Date: 9/18/2007
#include "tbb/parallel_for.h"
#include "tbb/task_scheduler_init.h"
//! Recursively divisible set of permutations of N objects of type T.
/** Internally, it represents all permutations x of a such that:
 x[k]==a[k] for all k
 x[position]is some a[k] for k in [i,j) 
 x[k] is some a[k] for k in [position,N]
 Be sure to use the default simple_partitioner, not auto_partitioner, because the range cannot be iterated
 over unless is_divisible() returns true.
template<typename T, size_t N>
class permutation_range {
 unsigned position;
 //! When position reaches limit, the set of permutations is no longer considered divisible.
 const unsigned limit;
 unsigned i, j;
 T a[N];
 //! Check that internal state is c
 bool assert_okay() {
 assert( i
 assert( j<=N );
 assert( position<=i );
 assert( position<=limit );
 for( unsigned k=0; k
 assert( a[k]
 for( unsigned n=0; n
 assert( n==k || a[k]!=a[n] );
 return true;
 void normalize_if_leaf() {
 if( position==limit-1 && i+1==j ) {
 std::rotate(&a[position],a+i, a+i+1);
 assert( assert_okay() );
 bool empty() const {
 return position==N;
 bool is_divisible() const {
 return position+1
 //! Construct initial permutation
 permutation_range( const T x[N], unsigned grainsize ) : 
 std::copy( x, x+N, a );
 assert( assert_okay() );
 //! Splitting constructor
 permutation_range( permutation_range& r, tbb::split ) : limit(r.limit) {
 assert( r.assert_okay() );
 assert( r.is_divisible() );
 if( r.i+1==r.j ) {
 std::rotate(&r.a[r.position],r.a+r.i, r.a+r.i+1);
 r.i = ++r.position;
 r.j = N;
 std::copy( r.a, r.a+N, a );
 int m = (r.j-r.i)/2;
 j = r.j;
 i = r.j = r.i+m;
 position = r.position;
 //! Set x[] to first permutation in this range.
 void get_first_subpermutation( T x[N] ) const {
 std::copy( a, a+N, x );
 //! Advance x[] to the next permutation in this range. Returns true unless next permutation was a wrap.
 bool next_subpermutation( T x[N] ) const {
 return std::next_permutation( x+limit, x+N );
const int N = 5;
struct Body {
 void operator()( const permutation_range<int,N>& r ) const {
 int a[N];
 // Process all permutations in the subrange
 printf("	slice
 do {
 // Operations on a given permutation
 printf("		");
 for( int k=0; k
 printf("%d ",a[k]);
bsp; printf("
 // Get the next permutation
 } while( r.next_subpermutation( a ) );
int main() {
 const int N = 5;
 tbb::task_scheduler_init init(1);
 int a[N];
 for( unsigned k=0; k
 a[k] = k;
 // Try different grain sizes
 for( int g=1; g<=N; ++g ) {
 printf("With grain size=%d
 tbb::parallel_for( permutation_range<int,N>(a,g), Body() );
 return 0;

Leave a Comment

Please sign in to add a comment. Not a member? Join today