How to run Intel MPI on Xeon Phi


The Intel® MPI Library supports the Intel® Xeon Phi™ coprocessor in 3 major ways:

  • The offload model where all MPI ranks are run on the main Xeon host, and the application utilizes offload directives to run on the Intel Xeon Phi corpocessor card,
  • The native model where all MPI ranks are run on the Intel Xeon Phi coprocessor card, and
  • The symmetric model where MPI ranks are run on both the Xeon host and the Xeon Phi coprocessor card.

This article will focus on the native and symmetric models only. If you'd like more information on the offload model, this article gives a great overview and even more details are available in the Intel® Compiler documentation.


The most important thing to remember is that we’re treating the Xeon Phi coprocessor cards as simply another node in a heterogeneous cluster. To that effect, running an MPI job in either the native and symmetric modes is very similar to running a regular Xeon MPI job. On the flip side, that does require some prerequisites to be fulfilled for each coprocessor card to be completely accessible via MPI.

Uniquely accessible hosts
All coprocessor cards on the system need to have a unique IP address that's accessible from the local host, other Xeon hosts on the system, and other Xeon Phi cards attached to those hosts.  Again, think of simply adding another node to an existing cluster.  A very simple test of this will be the ability to ssh from one Xeon Phi coprocessor (let's call it node0-mic0) to its own Xeon host (node0), as well as ssh to any other Xeon host on the cluster (node1) and their respective Xeon Phi cards (node1-mic0).  Here's a quick example:

[user@node0-mic0 user]$ ssh node1-mic0 hostname

Access to necessary libraries
Make sure all MPI libraries are accessible from the Xeon Phi card. There are a couple of ways to do this:

  • Setup an NFS share between the Xeon host where the Intel MPI Library is installed, and the Xeon Phi corprossesor card.
  • Manually copy all Xeon Phi-specific MPI libraries to the card.  More details on which libraries to copy and where are available here.

Assuming both of those requirements have been met, you're ready to start using the Xeon Phi corprocessors in your MPI jobs.

Running natively on the Xeon Phi corprocessor

The set of steps to run on the Xeon Phi coprocessor card exclusively can be boiled down to the following:

1. Set up the environment
Use the appropriate scripts to set your runtime environment. The following assumes all Intel® Software Tools are installed in the /opt/intel directory.

# Set your compiler
[user@host] $ source /opt/intel/composer_xe_<version>/bin/ intel64

#Set your MPI environment
[user@host] $ source /opt/intel/impi/<version>/bin64/

2. Compile for the Xeon Phi coprocessor card
Use the -mmic option for the Intel Compiler to build your MPI sources for the card.

[user@host] $ mpiicc -mmic -o test_hello.MIC test.c

3. Copy the Xeon Phi executables to the card
Transfer the executable that you just created to the card for execution.

[user@host] $ scp ./test_hello.MIC node0-mic0:~/test_hello

This step is not required if your host and card are NFS-shared. Also note that we're renaming this executable during the copy process. This helps us use the same mpirun command for both native and symmetric modes.

4. Launch the application
Simply use the mpirun command to start the executable remotely on the card. Note that if you're planning on using a Xeon Phi coprocessor in your MPI job, you have to let us know by setting the I_MPI_MIC environment variable. This is a required step.

[user@host] $ export I_MPI_MIC=enable
[user@host] $ cat mpi_hosts
[user@host] $ mpirun –f mpi_hosts –n 2 ~/test_hello
Hello world: rank 0 of 2 running on node0-mic0
Hello world: rank 1 of 2 running on node0-mic0

Running symmetrically on both the Xeon host and the Xeon Phi coprocessor

You're now trying to utilize both the Xeon hosts on your cluster, and the Xeon Phi coprocessor cards attached to them.

Step 1.
will be the same here

2. Compile for the Xeon Phi coprocessor card and for the Xeon host
You're now going to have compile two different sets of binaries:

# for the Xeon Phi comprocessor
[user@host] $ mpiicc -mmic -o test_hello.MIC test.c

# for the Xeon host
[user@host] $ mpiicc -o test_hello test.c

3. Copy the Xeon Phi executables to the card
Here, we still have to transfer the Xeon Phi coprocessor-compiled executables to the card.  And again, we're renaming the executable during the transfer:

[user@host] $ scp ./test_hello.MIC node0-mic0:~/test_hello

Now, this will not work if your $HOME directory (where the executables live) is NFS-shared between host and card.  For more tips on what to do in NFS-sharing cases, check out this article.

4. Launch the application
Finally, you run the MPI job.  Your only difference here would be edits in your hosts file as you now have to add the Xeon hosts to the list.

[user@host] $ export I_MPI_MIC=enable
[user@host] $ cat mpi_hosts
[user@host] $  mpirun –f mpi_hosts –perhost 1 –n 2 ~/test_hello
Hello world: rank 0 of 2 running on node0
Hello world: rank 1 of 2 running on node0-mic0
For more complete information about compiler optimizations, see our Optimization Notice.


Gergana S. (Intel)'s picture

Ivan, your 2 options are to either mount /opt/intel across both Xeon and Xeon Phi cards (the Phi-specific libs are available in a separate directory so our runtimes will pick the correct ones), or you can simply manually copy (via scp, etc) the MKL lib files from the <install_dir>/mic directory from the Xeon host to /lib64 on the Xeon Phi card.


Ivan L.'s picture

I use MKL in my code. My home directory is mounted on the Xeon host as well as on the Xeon Phi coprocessor card. The directory /opt/intel is mounted only on the the Xeon host and the MKL files are not accessible from the Xeon Phi card.

How to transfer the Xeon Phi library files to the card? If I put them in the same directory as executable files, the MIC files will be used also on the Xeon host.

kiran s.'s picture

i have tried your procedure to run mic executa


[kiran@compute012 mpi_program]$ mpirun -f mpi_host -n 4 ./hello_mic
pmi_proxy: line 0: exec: pmi_proxy: not found
Ctrl-C caught... cleaning up processes
[mpiexec@compute012] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed
[mpiexec@compute012] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream
[mpiexec@compute012] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@compute012] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec@compute012] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion
[kiran@compute012 mpi_program]$ cat mpi_host


drMikeT's picture

Hello Gergana,

Does IntelMPI for MPI code running on a MIC chip support shared-memory intra-node communication (i.e, among MIC cores) as it does with regular multi-core nodes? I am only referring to MPI communicaiton among cores of the same coprocessor.

Are PGAs environmnets supported between MIC and the Host processor ? In the sense that a process on the host and a process on MIC can "share" memory ?


Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.