In this article an OpenMP* based implementation of the Ant Colony Optimization algorithm was analyzed for bottlenecks with Intel® VTune™ Amplifier XE 2016 together with improvements using hybrid MPI-OpenMP and Intel® Threading Building Blocks were introduced to achieve efficient scaling across a four-socket Intel® Xeon® processor E7-8890 v4 processor-based system.
In this article, we present a tutorial on how to start using MPI SHM on multinode systems using Intel® Xeon® and Intel® Xeon Phi™ processors. The article uses a 1-D ring application as an example and includes code snippets to describe how to transform common MPI send/receive patterns to utilize the MPI SHM interface. The MPI functions that are necessary for internode and intranode communications...
This guide is intended to help current HPL users get better benchmark performance by utilizing BLAS from the Intel® Math Kernel Library (Intel® MKL).