Tim Mattson (Intel) has authored an extensive series of excellent videos as in introduction to OpenMP*.
In this article an OpenMP* based implementation of the Ant Colony Optimization algorithm was analyzed for bottlenecks with Intel® VTune™ Amplifier XE 2016 together with improvements using hybrid MPI-OpenMP and Intel® Threading Building Blocks were introduced to achieve efficient scaling across a four-socket Intel® Xeon® processor E7-8890 v4 processor-based system.
Eduardo H. M. Cruz, Matheus S. Serpa, Arthur M. Krause, Philippe O. A. Navaux
This article demonstrates techniques that software developers can use to identify and fix NUMA-related performance issues in their applications.