How to run Intel MPI on Xeon Phi

Overview

The Intel® MPI Library supports the Intel® Xeon Phi™ coprocessor in 3 major ways:

  • The offload model where all MPI ranks are run on the main Xeon host, and the application utilizes offload directives to run on the Intel Xeon Phi corpocessor card,
  • The native model where all MPI ranks are run on the Intel Xeon Phi coprocessor card, and
  • The symmetric model where MPI ranks are run on both the Xeon host and the Xeon Phi coprocessor card.

This article will focus on the native and symmetric models only. If you'd like more information on the offload model, this article gives a great overview and even more details are available in the Intel® Compiler documentation.

Prerequisites

The most important thing to remember is that we’re treating the Xeon Phi coprocessor cards as simply another node in a heterogeneous cluster. To that effect, running an MPI job in either the native and symmetric modes is very similar to running a regular Xeon MPI job. On the flip side, that does require some prerequisites to be fulfilled for each coprocessor card to be completely accessible via MPI.

Uniquely accessible hosts
All coprocessor cards on the system need to have a unique IP address that's accessible from the local host, other Xeon hosts on the system, and other Xeon Phi cards attached to those hosts.  Again, think of simply adding another node to an existing cluster.  A very simple test of this will be the ability to ssh from one Xeon Phi coprocessor (let's call it node0-mic0) to its own Xeon host (node0), as well as ssh to any other Xeon host on the cluster (node1) and their respective Xeon Phi cards (node1-mic0).  Here's a quick example:

[user@node0-mic0 user]$ ssh node1-mic0 hostname
node1-mic0

Access to necessary libraries
Make sure all MPI libraries are accessible from the Xeon Phi card. There are a couple of ways to do this:

  • Setup an NFS share between the Xeon host where the Intel MPI Library is installed, and the Xeon Phi corprossesor card.
  • Manually copy all Xeon Phi-specific MPI libraries to the card.  More details on which libraries to copy and where are available here.

Assuming both of those requirements have been met, you're ready to start using the Xeon Phi corprocessors in your MPI jobs.

Running natively on the Xeon Phi corprocessor

The set of steps to run on the Xeon Phi coprocessor card exclusively can be boiled down to the following:

1. Set up the environment
Use the appropriate scripts to set your runtime environment. The following assumes all Intel® Software Tools are installed in the /opt/intel directory.

# Set your compiler
[user@host] $ source /opt/intel/composer_xe_<version>/bin/compilervars.sh intel64

#Set your MPI environment
[user@host] $ source /opt/intel/impi/<version>/bin64/mpivars.sh

2. Compile for the Xeon Phi coprocessor card
Use the -mmic option for the Intel Compiler to build your MPI sources for the card.

[user@host] $ mpiicc -mmic -o test_hello.MIC test.c

3. Copy the Xeon Phi executables to the card
Transfer the executable that you just created to the card for execution.

[user@host] $ scp ./test_hello.MIC node0-mic0:~/test_hello

This step is not required if your host and card are NFS-shared. Also note that we're renaming this executable during the copy process. This helps us use the same mpirun command for both native and symmetric modes.

4. Launch the application
Simply use the mpirun command to start the executable remotely on the card. Note that if you're planning on using a Xeon Phi coprocessor in your MPI job, you have to let us know by setting the I_MPI_MIC environment variable. This is a required step.

[user@host] $ export I_MPI_MIC=enable
[user@host] $ cat mpi_hosts
node0-mic0
[user@host] $ mpirun –f mpi_hosts –n 2 ~/test_hello
Hello world: rank 0 of 2 running on node0-mic0
Hello world: rank 1 of 2 running on node0-mic0

Running symmetrically on both the Xeon host and the Xeon Phi coprocessor

You're now trying to utilize both the Xeon hosts on your cluster, and the Xeon Phi coprocessor cards attached to them.

Step 1.
will be the same here

2. Compile for the Xeon Phi coprocessor card and for the Xeon host
You're now going to have compile two different sets of binaries:

# for the Xeon Phi comprocessor
[user@host] $ mpiicc -mmic -o test_hello.MIC test.c

# for the Xeon host
[user@host] $ mpiicc -o test_hello test.c

3. Copy the Xeon Phi executables to the card
Here, we still have to transfer the Xeon Phi coprocessor-compiled executables to the card.  And again, we're renaming the executable during the transfer:

[user@host] $ scp ./test_hello.MIC node0-mic0:~/test_hello

Now, this will not work if your $HOME directory (where the executables live) is NFS-shared between host and card.  For more tips on what to do in NFS-sharing cases, check out this article.

4. Launch the application
Finally, you run the MPI job.  Your only difference here would be edits in your hosts file as you now have to add the Xeon hosts to the list.

[user@host] $ export I_MPI_MIC=enable
[user@host] $ cat mpi_hosts
node0
node0-mic0
[user@host] $  mpirun –f mpi_hosts –perhost 1 –n 2 ~/test_hello
Hello world: rank 0 of 2 running on node0
Hello world: rank 1 of 2 running on node0-mic0
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione