How to run Intel® MPI on the Intel® Xeon Phi™ Coprocessor

Support for the Intel® Xeon Phi™ coprocessor (formerly code named Knights Corner) is being deprecated.


The Intel® MPI Library supports the Intel® Xeon Phi™ coprocessor in 3 major ways:

  • The offload model where all MPI ranks are run on the main Intel® Xeon® host, and the application utilizes offload directives to run on the Intel Xeon Phi coprocessor card,
  • The native model where all MPI ranks are run on the Intel Xeon Phi coprocessor card, and
  • The symmetric model where MPI ranks are run on both the Intel Xeon host and the Intel Xeon Phi coprocessor card.

This article will focus on the native and symmetric models only. If you'd like more information on the offload model, this article gives a great overview and even more details are available in the Intel® Compiler documentation.


The most important thing to remember is that we’re treating the Intel Xeon Phi coprocessor cards as simply another node in a heterogeneous cluster. To that effect, running an MPI job in either the native and symmetric modes is very similar to running a regular Intel Xeon MPI job. On the flip side, that does require some prerequisites to be fulfilled for each coprocessor card to be completely accessible via MPI.

Uniquely accessible hosts
All coprocessor cards on the system need to have a unique IP address that's accessible from the local host, other Intel Xeon hosts on the system, and other Intel Xeon Phi cards attached to those hosts.  Again, think of simply adding another node to an existing cluster.  A very simple test of this will be the ability to ssh from one Intel Xeon Phi coprocessor (let's call it node0-mic0) to its own Intel Xeon host (node0), as well as ssh to any other Intel Xeon host on the cluster (node1) and their respective Intel Xeon Phi cards (node1-mic0).  Here's a quick example:

[user@node0-mic0 user]$ ssh node1-mic0 hostname

Access to necessary libraries
Make sure all MPI libraries are accessible from the Intel Xeon Phi card. There are a couple of ways to do this:

  • Setup an NFS share between the Intel Xeon host where the Intel MPI Library is installed, and the Intel Xeon Phi coprocessor card.
  • Manually copy all Intel Xeon Phi-specific MPI libraries to the card.  More details on which libraries to copy and where are available here.

Assuming both of those requirements have been met, you're ready to start using the Intel Xeon Phi coprocessors in your MPI jobs.

Running Natively on the Intel Xeon Phi Coprocessor

The set of steps to run on the Intel Xeon Phi coprocessor card exclusively can be boiled down to the following:

1. Set up the environment
Use the appropriate scripts to set your runtime environment. The following assumes all Intel® Software Tools are installed in the /opt/intel directory.

# Set your compiler
[user@host] $ source /opt/intel/composer_xe_<version>/bin/ intel64

#Set your MPI environment
[user@host] $ source /opt/intel/impi/<version>/bin64/

2. Compile for the Intel Xeon Phi coprocessor card
Use the -mmic option for the Intel Compiler to build your MPI sources for the card.

[user@host] $ mpiicc -mmic -o test_hello.MIC test.c

3. Copy the Intel Xeon Phi executables to the card
Transfer the executable that you just created to the card for execution.

[user@host] $ scp ./test_hello.MIC node0-mic0:~/test_hello

This step is not required if your host and card are NFS-shared. Also note that we're renaming this executable during the copy process. This helps us use the same mpirun command for both native and symmetric modes.

4. Launch the application
Simply use the mpirun command to start the executable remotely on the card. Note that if you're planning on using an Intel Xeon Phi coprocessor in your MPI job, you have to let us know by setting the I_MPI_MIC environment variable. This is a required step.

[user@host] $ export I_MPI_MIC=enable
[user@host] $ cat mpi_hosts
[user@host] $ mpirun –f mpi_hosts –n 2 ~/test_hello
Hello world: rank 0 of 2 running on node0-mic0
Hello world: rank 1 of 2 running on node0-mic0

Running Symmetrically on Both the Intel Xeon Host and the Intel Xeon Phi DCoprocessor

You're now trying to utilize both the Intel Xeon hosts on your cluster, and the Intel Xeon Phi coprocessor cards attached to them.

Step 1.
will be the same here

2. Compile for the Intel Xeon Phi coprocessor card and for the Intel Xeon host
You're now going to have compile two different sets of binaries:

# for the Intel Xeon Phi comprocessor
[user@host] $ mpiicc -mmic -o test_hello.MIC test.c

# for the Intel Xeon host
[user@host] $ mpiicc -o test_hello test.c

3. Copy the Intel Xeon Phi executables to the card
Here, we still have to transfer the Intel Xeon Phi coprocessor-compiled executables to the card.  And again, we're renaming the executable during the transfer:

[user@host] $ scp ./test_hello.MIC node0-mic0:~/test_hello

Now, this will not work if your $HOME directory (where the executables live) is NFS-shared between host and card.  For more tips on what to do in NFS-sharing cases, check out this article.

4. Launch the application
Finally, you run the MPI job.  Your only difference here would be edits in your hosts file as you now have to add the Intel Xeon hosts to the list.

[user@host] $ export I_MPI_MIC=enable
[user@host] $ cat mpi_hosts
[user@host] $  mpirun –f mpi_hosts –perhost 1 –n 2 ~/test_hello
Hello world: rank 0 of 2 running on node0
Hello world: rank 1 of 2 running on node0-mic0
For more complete information about compiler optimizations, see our Optimization Notice.


gsslavov's picture

Ivan, your 2 options are to either mount /opt/intel across both Xeon and Xeon Phi cards (the Phi-specific libs are available in a separate directory so our runtimes will pick the correct ones), or you can simply manually copy (via scp, etc) the MKL lib files from the <install_dir>/mic directory from the Xeon host to /lib64 on the Xeon Phi card.


lirkov's picture

I use MKL in my code. My home directory is mounted on the Xeon host as well as on the Xeon Phi coprocessor card. The directory /opt/intel is mounted only on the the Xeon host and the MKL files are not accessible from the Xeon Phi card.

How to transfer the Xeon Phi library files to the card? If I put them in the same directory as executable files, the MIC files will be used also on the Xeon host.

i have tried your procedure to run mic executa


[kiran@compute012 mpi_program]$ mpirun -f mpi_host -n 4 ./hello_mic
pmi_proxy: line 0: exec: pmi_proxy: not found
Ctrl-C caught... cleaning up processes
[mpiexec@compute012] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed
[mpiexec@compute012] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream
[mpiexec@compute012] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@compute012] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec@compute012] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion
[kiran@compute012 mpi_program]$ cat mpi_host


miketpower's picture

Hello Gergana,

Does IntelMPI for MPI code running on a MIC chip support shared-memory intra-node communication (i.e, among MIC cores) as it does with regular multi-core nodes? I am only referring to MPI communicaiton among cores of the same coprocessor.

Are PGAs environmnets supported between MIC and the Host processor ? In the sense that a process on the host and a process on MIC can "share" memory ?


Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.