This article describes a parallel merge sort code, and why it is more scalable than parallel quicksort or parallel samplesort. The code relies on the C++11 “move” semantics.
This is the second article in a series of articles about High Performance Computing with the Intel Xeon Phi.
Ray-tracing is a classic example of an embarrassingly parallel algorithm; since each pixel is typically independent of the rest, theoretically every pixel can be done in parallel (given enough core
The N-Body problem is a classic example used frequently to demonstrate parallelization and how it improves performance.
Download Program Optimization through Loop Vectorization [PDF 617KB]