Episode 7 of the “Hands-On Workshop (HOW) series on parallel programming and optimization with Intel® architectures” is Part 1 (of 2) of a series on the optimization of multi-threaded applications.
We re-visit OpenMP* and the binning example from Episode 6 to implement multi-threading in that code. The process takes us to the discussion of race conditions, mutexes, efficient parallel reduction with thread-private variables, and we also encounter false sharing and demonstrate how it can be eliminated.
The second example discussed in this episode represents stencil operations. The discussion reveals the problem of insufficient parallelism and demonstrates how to move parallelism from vectors to cores using strip-mining and loop collapse.
Performance results are provided for both examples at different stages of optimization are measured on an Intel® Xeon® processor and an Intel® Xeon Phi™ coprocessor.
In the hands-on part of the episode, we apply the discussed techniques to the example applications in real time.