Compiler Methodology for Intel® MIC Architecture
This article describes a parallel merge sort code, and why it is more scalable than parallel quicksort or parallel samplesort. The code relies on the C++11 “move” semantics.
This article explains the sparse ruler problem, two parallel codes for computing sparse rulers, and some new results that reveal a surprising "gap" behavior for solutions to the sparse ruler proble
Intel® Cilk™ Plus threading is a highly efficient threading model. Its simplicity, using three simple constructs, belies its power and flexibility.
I am not a fan of detours. The challenge of scaling to extreme computing is a milestone on the road to every day computing.
The N-Body problem is a classic example used frequently to demonstrate parallelization and how it improves performance.