As part of my focus on software performance, I also support and consult on implementing scalable parallelism in applications.
Ray-tracing is a classic example of an embarrassingly parallel algorithm; since each pixel is typically independent of the rest, theoretically every pixel can be done in parallel (given enough core
The N-Body problem is a classic example used frequently to demonstrate parallelization and how it improves performance.
This article describes a parallel merge sort code, and why it is more scalable than parallel quicksort or parallel samplesort. The code relies on the C++11 “move” semantics.
We optimized a version of Dijkstra’s shortest path graph algorithm using a combination of Intel® Cilk™ Plus array notation and OpenMP* parallel for.