Full screen debugging Cilk code

Full screen debugging Cilk code

21 void sample_qsort(int * begin, int * end)

22 {

23 if (begin != end) {

24 --end; // Exclude last element (pivot)

25 int * middle = std::partition(begin, end,

26 std::bind2nd(std::less(),*end));

28 std::swap(*end, *middle); // pivot to middle

29 _Cilk_spawn sample_qsort(begin, middle);

30 sample_qsort(++middle, ++end); // Exclude pivot

31 _Cilk_sync;

32 }

33

On page 15 of the Programmer's Guide there is an example. The code is reproduced above. Is there a way that I can use a full sreen debugger to walk through this code visually? This I hope can show me what you are talking about with the work stealing algorithm. I use a full screen debugger, called nemiver and it does not seem to work with Cilk code.

It seems to me in this example that you are sorting only half of the unsorted list. The statement..

cilk_spawn sample_qsort(begin, midddle);

..only works with half of the code.

I just need to see how this works in a step by step walkthrough manner.

Newport_j

12 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Which OS are you using?

If you really want insight into this, I'd use a **MUCH** simpler program. Something like the following:

#include
#include

#ifdef _WIN32
#include
#define sleep(s) ::Sleep(s*1000)
#else
#include
#endif

void wait_for_me(int seconds)
{
printf("Hello from the main thread. Waiting %d seconds...\n", seconds);
fflush(stdout);
sleep(seconds);
printf("main thread continuing...\n");
fflush(stdout);
}

void hello()
{
printf("Hello from the stolen continuation\n");
fflush(stdout);
}

int cilk_main(int argc, char **argv)
{
cilk_spawn wait_for_me(5);
hello();
cilk_sync;
printf("All synched now\n");
}

Note that I haven't compiled this, so it may be a bit buggy. If you just run this, you should see something like the following output:

Hello from the main thread. Waiting5 seconds...
Hello from the stolen continuation
main thread continuing...
All synched now

If you run with a single worker, you'll see something like this:

Hello from the main thread. Waiting 5 seconds...
main thread continuing...
Hello from the stolen continuation
All synched now

- Barry

Line 30 does the second half of the array.

Unfortunately, Cilk++ is extremely difficult to debug when running in parallel. The encouraged usage is to debug your application serially. Once you are running in parallel, Cilkscreen is the right tool for detecting bugs introduced by Cilk++.

Tracing a Cilk++ execution can be done if you run with only one worker (serially, with Cilk++ code). You can do this by specifying "-cilk_set_worker_count=1" on the command line if you are using cilk_main(). Otherwise, you can set the environment variable CILK_NPROC=1 and get the same result.

I use Ubuntu 10.04, thus I can get rid of

#ifdef_WIN32
#include

To make the program a little less Windows specific.

I will try to this program to run.

Newport_j


Okay, I got it to run. The issue is somewhat clearer now. Please nform me of one thing. Let us go back to the quicksort and I can pose the question. The code is:

void sample_qsort(int * begin, int * end)

22 {

23 if (begin != end) {

24 --end; // Exclude last element (pivot)

25 int * middle = std::partition(begin, end,

26 std::bind2nd(std::less(),*end));

28 std::swap(*end, *middle); // pivot to middle

29 _Cilk_spawn sample_qsort(begin, middle);

30 sample_qsort(++middle, ++end); // Exclude pivot

31 _Cilk_sync;

32 }

33 }

Now you break the sort string at the middle. I am interested in what would happen if, instead of breaking it in the middle and going from begin to middle, and then letting the other piece of code - the one succeeding theCilk_spawn statement,it goes frommiddle to end. What would happen if you used different proportions. Instead of half and half that is, what about 1/3 and 2/3? In other words, the Cilk_spawn statementnow goes1/3 of the way and the succedding code now goes 2/3 of the way?

Both statements would still be there, but they would no longer be equally proprtionate.What would happen then?

I may be way off base n this question, but it is somethng that I thought about.

Newport_j

> Both statements would still be there, but they would no longer be
>equally proportionate. What would happen then?

Let's assume that there are9 items in your array of ints.The recursion will be something like this, where the indentation level indicates the recursion level:

1 The first level is going to break that into a group of 3, and and a group of 6.
2 The group of 3 will be broken into a group of 1 and a group of 2
3 The group of 1 will fail the begin != end test, so it will stop recursing
4 The group of 2 will break into a group of 1 and a group of 1
5 The first group of 1 will fail the begin != end test, so it will stop recursing
6 The second group of 1 will fail the begin != end test, so it will stop recursing
7 The group of 6 will break into a group of 2 and a group of 4

8 The group of 2 will break into a group of 1 and a group of 1
9 The first group of 1 will fail the begin != end test, so it will stop recursing
10 The second group of 1 will fail the begin != end test, so it will stop recursing
11 The group of 4 will break into a group of 1 and a group of 3.
12 The group of 1 will fail the begin != end test, so will stop recursing
13 The group of 3 will break into a group of 1 and a group of 2
14 The group of 1 will fail the begin != end test, so will stop recursing
15 The group of 2 will break into a group of one and a group of 1
16 The first group of 1 will fail the begin != end test, so will stop recursing
17 The second group of 1 will fail the begin != end test, so will stop recursing

Now let's play scheduler with 2 processors. Worker 0 will run serially to the first cilk_spawn, and the recurse on the group of 3 (line 2). Eventually Worker1 will come along and steal the continuation, which is the group of 6 at (line 7).

Assuming that each of the 17 lines takes the same amount of time (which is incorrect since the time taken for the partition will vary with the number of elements, but assume it anyway), Worker0 will run out of work first since it only has lines 2-6 to do. So it will reach the _Cilk_sync for the first level of recursion and not be able to continue until Worker1 completes lines 7-17.

So Worker0 puts aside the computation at the _Cilk_sync and looks for other work it can steal. For the sake of this discussion, assume that Worker1 is busy on lines 7-10. So Worker0 will steal the continuation for the second level of recursion from Work1, at line 11.

etc.

Each worker has a deque which records the continuation on each spawn. The worker itself always is working at the tail of the deque, pushing entries on, and popping them off when it returns and claims the work for itself. When work is stolen, it's taken from the head of the victim's deque.

On a return from a spawned function, the deque is checked to see if the parent has been stolen. If it hasn't the tail of the deque is popped and the code returns normally. If the parent has been stolen, the code jumps to the sync in the parent.

At a sync, whichever worker reaches it last will continue executing after the sync. (But that's an implementation detail and you shouldn't depend on it. A worker continues executing. You can't predict which one, unless there's only one.)

So in this unbalanced example, the workers will steal work back and forth until the work is completed.

- Barry

So in this statement the cilk_sync statement is optional (the cilk compiler puts it in if it is not there), but the programmer can put it in if he so desires?

Also, I guess from your reply the result of the disproportionate break down would still be the same?

So what hapens if ratio changes even more?

Newport_j

> So in this statement the cilk_sync statement is optional (the cilk compiler puts it in if it is not there),
>but the programmer can put it in if he so desires?

Correct. The compiler automatically inserts a cilk_sync statement at the end of every function that contains a cilk_spawn. There's a check so the call only gets executed if there have been any spawns since the last sync (or the beginning of the function). So it costs you very little to put the cilk_sync; statement in if it makes things clearer.

> Also, I guess from your reply the result of the disproportionate break down would still be the same?

Correct. You should get exactly the same result.

> So what hapens if ratio changes even more?

If the ratio changes even more, you'll see more stealing.

As a general rule, spawns are cheap (roughly 4x the cost of a call) and steals are expensive.So the more steals you do, the more overhead you introduce into your application. The goal is to amortize that over as much work as possible.

This is why Cilk is great for recursive algorithms. Theytend to be balanced and divide naturally among the workers, resulting in very little stealing.

- Barry

Is that why you break it in the middle. It is the most efficient way to break it halfway for 1/2,1/2? Any, break like the one I suggested 1/3,2/3 will get you the correct answer, but in a inefficient way to do it.

Newport_j

> Is that why you break it in the middle. It is the most efficient way to break it halfway for 1/2,1/2?
> Any, break like the one I suggested 1/3,2/3 will get you the correct answer, but in a inefficient
>way to do it.

Correct. You could also break it intoN equal portions, spawn the first(N-1) and then just call the Nth.

For example, I've seen an algorithm which broke naturally into quarters. The author spawned the processing function for the first 3 quarters, and then called the processing function for the last quarter.

However, it'sa bad idea to try totailor your algorithm to the number of processors.Make your algorithm clear and let the Cilk scheduler worry about whichstrands are running where. As long as you break the problem into sufficient pieces, your application should scale as the number of processors goes up in the future. The Cilkview scalability analyzer can helpyou understandthe limits of scalability ofyour application.

- Barry

In an old example a few paragraphs above, I said what would happen if the disproportionate ratio is 1/3,2/3 and you said more work stealing.

So what would happen if the disproportitate ratio is 2/3,1/3?

I take it stealing is expensive and you want to avoid it. So in my ratio immediately above, what happnes. I would think less stealing as compared to 1/2,1/2. But what else?

Newport_j

1/3, 2/3 will be pretty much the same as 2/3, 1/3. In one case I believe the stealing will ping-pong back and forth. In the other, one will continually steal from the other.

Consider what happens with 2 workers in the 1/2, 2/3 case.

Recursion level 0: Worker0 will start working on the first third. Worker1 will come along and steal the remaining two thirds.Worker0 will finish it's portion and attempt to sync. Since it can't progress any further, it will cast about for something useful to do.

Recursion level 1: Worker1 is working on the first third. When Worker0 looks for something useful to do, it will steal the remaining 2/3 of this level. When Worker1 finishes the first third, it will attempt to sync. Since it can't progress any further, it will cast about for something to do.

etc.

Now the 2/3, 1/3 case.

Recursion level 0: Worker0 will start working on the first 2/3s. Worker1 will come along and steal the remaining third. When Worker1 finishes it's portion, it will attempt to sync. Since it can't progress any further, it will cast about for something useful to do.

Recursion level 1: Worker0 is working on the first 2/3s. Worker1 will come along and steal the remaining third. When Worker1 finishes it's portion, it will attempt to sync. Since it can't progress any further, it will cast about for something useful to do.

etc.

Slightly different behavior, but in both cases you'll end up with extra steals.

- Barry

Leave a Comment

Please sign in to add a comment. Not a member? Join today