I understand cilk_sort is parallized across shared memory multicores?
What sorting algorithm does it use: quicksort, insertions sort, heapsort, etc?
how about cilk_sort_in_place ?
The sort routines in Cilkpub are coded in Cilk Plus, and Cilk Plus requires shared memory.
cilk_sort_in_place uses a simple in-place parallel quicksort algorithm. cilk_sort uses a parallel sample sort algorithm.
The code in Cilkpub for sorting is derived from the code used for the book "Structured Parallel Programming," so you can find more details in the relevant chapters. http://parallelbook.com/
The book covers parallel versions of quicksort, heap sort, and sample sort. The download package "CODE EXAMPLES FROM BOOK" at http://parallelbook.com/downloads has the source to all three. For small core counts, parallel quicksort works well, but loses steam because its span (critical path) is O(N), and thus can never do better than O(lg N) speedup on a idealized machine. Parallel heap sort and sample sort both usually scale well, with sample sort often doing better because it has excellent cache behavior.
ok thanks guys, very helpful detail. What then is the difference between cilk_sort_in_place and parallel_for from TBB? My guess is that Thread Building Block (TBB) uses Cilk parallel programming constructs under the hood..(?)
cilk_sort_in_place is a sort. tbb's parallel_for is a parallel for loop. Perhaps you meant to compare parallel_for with cilk_for?
TBB predates Intel's implementation of Cilk and does not use Cilk constructs under the hood. TBB's goal is to be a "pure library" solution for parallelism that does not require any special compiler support, whereas Cilk requires compiler support and thus gain the benefits of having such support, notably enabling lower overhead per task and cleaner semantic properties.
TBB's parallel_sort is a parallel quicksort. I wrote back in the days when Pentium Ds roamed the earth, cores were few, and memory was slow. The in-place nature of parallel quicksort made it a winner in that era. Now that machines have changed, we really should update it. The book download has TBB versions of parallel sample sort and parallel merge sort that are often faster than tbb::parallel_sort.
I Barry, I meant parallel_sort versus cilk_sort_in_place. As Arch D. has stated, cilk is faster since TBB is a cross-plaform library, whereas, cilk is intel-plaform specific (requiring intel compiler) and as such will perform better. - that is my understanding from what Arch D is saying