Learn how to write an MPI program in Python*, and take advantage of Intel® multicore architectures using OpenMP threads and Intel® AVX512 instructions.
The Intel® MPI Library and OpenMP* runtime libraries can create affinities between processes or threads, and hardware resources. This affinity keeps an MPI process or OpenMP thread from migrating to a different hardware resource, which can have a dramatic effect on the execution speed of a program.
In this article an OpenMP* based implementation of the Ant Colony Optimization algorithm was analyzed for bottlenecks with Intel® VTune™ Amplifier XE 2016 together with improvements using hybrid MPI-OpenMP and Intel® Threading Building Blocks were introduced to achieve efficient scaling across a four-socket Intel® Xeon® processor E7-8890 v4 processor-based system.
Are you working with a hybrid program that just isn't performing? Do you feel like your application is on life support?
There are two principal methods of parallel computing: distributed memory computing and shared memory computing. As more processor cores are dedicated to large clusters solving scientific and engineering problems, hybrid programming techniques combining the best of distributed and shared memory programs are becoming more popular.
This case study examines the situation where the problem decomposition is the same for threading as it is for Message Passing Interface* (MPI); that is, the threading parallelism is elevated to the same level as MPI parallelism.
Code Sample included: Learn how to use MPI-3 shared memory feature using the corresponding APIs on the Intel® Xeon Phi™ processor.