This article describes a parallel merge sort code, and why it is more scalable than parallel quicksort or parallel samplesort. The code relies on the C++11 “move” semantics. It also points out a scalability trap to watch out for with C++. The attached code has implementations in Intel® Threading Building Blocks (Intel® TBB), Intel® Cilk™ Plus, and OpenMP*.
This article explains the sparse ruler problem, two parallel codes for computing sparse rulers, and some new results that reveal a surprising "gap" behavior for solutions to the sparse ruler problem. The code and results are included in the attached zip file.
A complete sparse ruler is a ruler with M marks than can measure any integer distance between 0 and L units. For example, the following ruler has 6 marks (including the ends) and can measure integer distance from 0 to 13:
Efficient Parallelization, Parallelization with Intel® Cilk™ Plus
This article describes the parallel sorts in the latest release of “Cilkpub”, an open-source library of utilities for Intel® Cilk™ Plus.
They are designed to be replacements for
std::sort that may provide speedup when sorting many items (on the order of at least 10000). For example:
extern float a; cilkpub::cilk_sort( a, a+n );