Hello,

I have a fairly simple question, but haven't been able to find an answer so far.

I have a mergesort algorithm which is partly parallelised. I haven't programmed a parallel merge yet, since I wanted to see, if I could also speed things up while dividing my array. It's giving me a couple of seconds of speedup, but only really if I put a cap on the recursion level. Since I'm sorting 100.000.000 Numbers here, I'm probably just generating a load of Overhead at some point.

My question: Is there any standard technique to fight this sideeffect? Without the upper bound, I'm also getting some speedup, but much less than with it! Here I have set it to 1024. But what if I want to sort 1024 Numbers, I wouldn't be doing anything in parallel? I'd say: Well that's not a lot, why go parallel, the speedup is tiny? And if I would want to sort 1024 numbers very often, I'd use a cilk_for loop which calculates the grainsize for me anyways so I don't really have to worry about overhead there.

Here is the part where I spawn the strands:

void merge_sort(int* c, int* a, int length)

{

if (length == 1)

...

else

{

int* c = new int[length];

if (n > 1024)

cilk_spawn merge_sort(c, a, length/2);

else

merge_sort(c, a, length/2);

merge_sort(c + length/2, a + length/2, length - length/2);

cilk_sync;

.....MERGE....

}

}

Parallelising the merge funktion will probably help quite a bit aswell. Doing that with a divide and coquer strategy will probably also be faster if the maximum recursion level is set to a fixed number.

Is there maybe kind of magic number?

I hope someone can help me clear things up.

Cheers,

Chris

p.s. What I have also noticed...if I spawn strands like so:

cilk_spawn mergeSort(c, a, n/2);

cilk_spawn mergeSort(c + n/2, a + n/2, n - n/2);

cilk_sync;

instead of:

cilk_spawn mergeSort(c, a, n/2);

mergeSort(c + n/2, a + n/2, n - n/2);

cilk_sync;

I'm just generating more Overhead in the first version, right?