Cilk_for division of work

Cilk_for division of work

Hello everybody,

 While working with CilkPlus on a project, the following question came up: How cilk_for divides the work?

 I thought that cilk_for divided the work equally on all available threads,
 but, when I run the following code, the situation is different:

 #include<stdio.h
 #include<stdlib.h
 int main() {
     int i = 0;
     int* cnt = (int*)calloc(__cilkrts_get_nworkers(), sizeof(int));

     cilk_for(i = 0; i < 200000; i++)
         cnt[__cilkrts_get_worker_number()]++;

     for(i = 0; i < __cilkrts_get_nworkers(); i++)
         printf("Thread %d: %d\n", i, cnt[i]);
     return EXIT_SUCCESS;
 }

 Testing with CILK_NWORKERS=2, sometimes cnt[0]=103125 and cnt[1]=96875, but
 in others their values are different (it's non-deterministic)

 So, again, how exactly does cilk_for divide the work? is there a way to force cilk_for to divide the work equally?

 Thanks

   - José Fuentes

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

All the details of the cilk_for implementation are in cilk-abi-cilk-for.cpp

If you haven't specified a grainsize (and you probably shouldn't), cilk_for will calculate one for you.  It's roughly min(2048, range / 8P)  See the function grainsize() for the exact details.

Then cilk_for uses a divide-and-conquer algorithm.  While the range is > grainsize, it keeps breaking the range in half and spawning half.  Eventually it gets a portion that's less than or equal to the grainsize and starts working on it.

Why doesn't it break the work into exactly P chunks and parcel them out to each of the workers (like OpenMP)?  Well, what happens if all of the units of work aren't all the same size?  Or if one of the workers loses quantum?  If that happened, all of the workers would have to wait for the "long straw" to complete.  Instead, since cilk_for breaks it into smaller pieces, idle workers can steal some of the work and help with the calculation.

Static scheduling can be great if you've got a very regular calculation on dedicated hardware.  cilk_for's implementation is better if the work is irregular or your running in a shared environment.

    - Barry

Login to leave a comment.