How to run Intel® MPI on the Intel® Xeon Phi™ Coprocessor

By Gergana S. Slavova,

Published:12/13/2012   Last Updated:09/12/2017

Support for the Intel® Xeon Phi™ coprocessor (formerly code named Knights Corner) is being deprecated.

Overview

The Intel® MPI Library supports the Intel® Xeon Phi™ coprocessor in 3 major ways:

  • The offload model where all MPI ranks are run on the main Intel® Xeon® host, and the application utilizes offload directives to run on the Intel Xeon Phi coprocessor card,
  • The native model where all MPI ranks are run on the Intel Xeon Phi coprocessor card, and
  • The symmetric model where MPI ranks are run on both the Intel Xeon host and the Intel Xeon Phi coprocessor card.

This article will focus on the native and symmetric models only. If you'd like more information on the offload model, this article gives a great overview and even more details are available in the Intel® Compiler documentation.

Prerequisites

The most important thing to remember is that we’re treating the Intel Xeon Phi coprocessor cards as simply another node in a heterogeneous cluster. To that effect, running an MPI job in either the native and symmetric modes is very similar to running a regular Intel Xeon MPI job. On the flip side, that does require some prerequisites to be fulfilled for each coprocessor card to be completely accessible via MPI.

Uniquely accessible hosts
All coprocessor cards on the system need to have a unique IP address that's accessible from the local host, other Intel Xeon hosts on the system, and other Intel Xeon Phi cards attached to those hosts.  Again, think of simply adding another node to an existing cluster.  A very simple test of this will be the ability to ssh from one Intel Xeon Phi coprocessor (let's call it node0-mic0) to its own Intel Xeon host (node0), as well as ssh to any other Intel Xeon host on the cluster (node1) and their respective Intel Xeon Phi cards (node1-mic0).  Here's a quick example:

[user@node0-mic0 user]$ ssh node1-mic0 hostname
node1-mic0

Access to necessary libraries
Make sure all MPI libraries are accessible from the Intel Xeon Phi card. There are a couple of ways to do this:

  • Setup an NFS share between the Intel Xeon host where the Intel MPI Library is installed, and the Intel Xeon Phi coprocessor card.
  • Manually copy all Intel Xeon Phi-specific MPI libraries to the card.  More details on which libraries to copy and where are available here.

Assuming both of those requirements have been met, you're ready to start using the Intel Xeon Phi coprocessors in your MPI jobs.

Running Natively on the Intel Xeon Phi Coprocessor

The set of steps to run on the Intel Xeon Phi coprocessor card exclusively can be boiled down to the following:

1. Set up the environment
Use the appropriate scripts to set your runtime environment. The following assumes all Intel® Software Tools are installed in the /opt/intel directory.

# Set your compiler
[user@host] $ source /opt/intel/composer_xe_<version>/bin/compilervars.sh intel64

#Set your MPI environment
[user@host] $ source /opt/intel/impi/<version>/bin64/mpivars.sh

2. Compile for the Intel Xeon Phi coprocessor card
Use the -mmic option for the Intel Compiler to build your MPI sources for the card.

[user@host] $ mpiicc -mmic -o test_hello.MIC test.c

3. Copy the Intel Xeon Phi executables to the card
Transfer the executable that you just created to the card for execution.

[user@host] $ scp ./test_hello.MIC node0-mic0:~/test_hello

This step is not required if your host and card are NFS-shared. Also note that we're renaming this executable during the copy process. This helps us use the same mpirun command for both native and symmetric modes.

4. Launch the application
Simply use the mpirun command to start the executable remotely on the card. Note that if you're planning on using an Intel Xeon Phi coprocessor in your MPI job, you have to let us know by setting the I_MPI_MIC environment variable. This is a required step.

[user@host] $ export I_MPI_MIC=enable
[user@host] $ cat mpi_hosts
node0-mic0
[user@host] $ mpirun –f mpi_hosts –n 2 ~/test_hello
Hello world: rank 0 of 2 running on node0-mic0
Hello world: rank 1 of 2 running on node0-mic0

Running Symmetrically on Both the Intel Xeon Host and the Intel Xeon Phi DCoprocessor

You're now trying to utilize both the Intel Xeon hosts on your cluster, and the Intel Xeon Phi coprocessor cards attached to them.

Step 1.
will be the same here

2. Compile for the Intel Xeon Phi coprocessor card and for the Intel Xeon host
You're now going to have compile two different sets of binaries:

# for the Intel Xeon Phi comprocessor
[user@host] $ mpiicc -mmic -o test_hello.MIC test.c

# for the Intel Xeon host
[user@host] $ mpiicc -o test_hello test.c

3. Copy the Intel Xeon Phi executables to the card
Here, we still have to transfer the Intel Xeon Phi coprocessor-compiled executables to the card.  And again, we're renaming the executable during the transfer:

[user@host] $ scp ./test_hello.MIC node0-mic0:~/test_hello

Now, this will not work if your $HOME directory (where the executables live) is NFS-shared between host and card.  For more tips on what to do in NFS-sharing cases, check out this article.

4. Launch the application
Finally, you run the MPI job.  Your only difference here would be edits in your hosts file as you now have to add the Intel Xeon hosts to the list.

[user@host] $ export I_MPI_MIC=enable
[user@host] $ cat mpi_hosts
node0
node0-mic0
[user@host] $  mpirun –f mpi_hosts –perhost 1 –n 2 ~/test_hello
Hello world: rank 0 of 2 running on node0
Hello world: rank 1 of 2 running on node0-mic0

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804