Building a Native Application for Intel® Xeon Phi™ Coprocessors


Some applications are well suited for running directly on Intel® Xeon Phi™ coprocessors without offload from a host system. This is also known as running in “native mode.” The purpose of this article is to describe how to build a native application that runs directly on an Intel Xeon Phi coprocessor and its embedded Linux* operating system. A summary of the steps is below.

  1. Determine if the application is suitable for native execution.
  2. Compile the application for native execution.
  3. Build required libraries for native execution.
  4. Copy the executable and any dependencies, such as runtime libraries, to the target hardware.
  5. Mount file shares to the target hardware for accessing input data sets and saving output data sets.
  6. Connect to the target hardware via console, set up the environment, and run the application.
  7. Debug by invoking the native application from the native debugger via a debug server running on the target.

Information on how to use basic Linux* services is not included in this document. Some commands used in these instructions require root permissions.

This document assumes you are using the compilers available in the Intel® Composer XE 2013 SP 1 package or later. Licensed users of the Intel® compilers may download the most recent versions of the compiler from the Intel® Software Development Products Registration Center. Evaluation versions of the Intel® compilers may be found at the Intel® Software Evaluation Center.

Super user permissions are required to configure non-root SSH and SCP access to Intel Xeon Phi coprocessors (not covered in this document).

Suitability for Native Execution

Native execution occurs when an application runs entirely on an Intel Xeon Phi coprocessor. Building a native application is a fast way to get existing software running with minimal code changes. First, ensure that the application is suitable for native execution. Data parallelism, usage of parallel algorithms, and application scalability are criteria for targeting Intel Xeon Phi coprocessors, but not for distinguishing between the usage of offload or native mode. An application likely to benefit from the large number of cores available with native execution tends to have the following characteristics.

  • A modest memory footprint, less than the available physical memory on the device
  • Very few serial segments
  • Does not perform extensive I/O
  • A complex code structure with no well-identified hot kernels that could be offloaded without substantial data transfer overhead

Just as for offload mode, additional software optimizations, especially vectorization, are likely to be needed to achieve good performance.

Compiling a Native Application

The Intel® C++ Compiler and Intel® Fortran Compiler support the cross compilation of code for Intel Xeon Phi coprocessors. The compiler option –mmic (Linux*) and /Qmic (Windows*) enables the cross compiler to generate an application that runs only on Intel Xeon Phi coprocessors. Follow the steps below to compile an application for native execution.

1. The Intel compilers rely on environment variables to function properly. First, execute the setup script to configure the Linux runtime environment. Use compilervars.csh for C shell and use for Bash shell. The following example runs the script in Bash shell from the default installation location.

$ source /opt/intel/composer_xe_2013_sp1/bin/ intel64

On Windows,from the Start menu, select All Programs > Intel Parallel Studio XE 2013 > Command Prompt > Parallel Studio XE with Intel Compiler XE and select Intel 64 mode.

2. Verify that the environment is set correctly by running icc –V (C++ compiler) or ifort –V (Fortran compiler) on Linux. On Windows use icl /QV (C++ compiler or ifort /QV (Fortran compiler).

$ icc –V
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version Build 20130905
Copyright (C) 1985-2013 Intel Corporation. All rights reserved.

3. Invoke the compiler and include the –mmic or /Qmic option in the build line.

$ icc –mmic mycode.c
> ifort /Qmic mycode.f90

Note: The default optimization level for the Intel compilers is O2 when no On option is used.

Building Libraries

When building a code to run natively on Intel Xeon Phi coprocessors, developers may need to build libraries for native execution and then link to these libraries when building their application to run in native mode. Follow the standard method for creating Linux shared and static libraries, but also include the –mmic or /Qmic option to generate code for Intel Xeon Phi coprocessors.

Shared Libraries

The steps below illustrate how to create a Linux shared object for native execution using the Intel compilers.

1. Compile the library source code. This will create mylib.o by default.

$ icc -mmic -c -fpic mylib.c
> ifort /Qmic –c –fpic mylib.f90

2. Use the compiler -shared option to create the library file from the object file(s).

$ icc -mmic -shared -o mylib.o
> ifort /Qmic -shared -o mylib.o

3. Compile and link the native application code with the native shared object.

$ icc -mmic main.c
> ifort /Qmic main.f90

Static Libraries

Developers can use xiar to create native static libraries.

To build a static library, do the following:

1. Compile the library source code. This will create mylib.o by default.

$ icc -mmic -c mylib.c
> ifort /Qmic –c mylib.f90

2. Use the archive utility to create the library file from the object file(s).

$ xiar crs libmylib.a mylib.o

3. Compile and link the native application code with the native static library.

$ icc -mmic main.c libmylib.a
> ifort /Qmic main.f90 libmylib.a 

Finding Dependencies

In addition to transferring the native application and the shared object to the coprocessor, you will also need to transfer shared objects that are required by the compiler runtime system. On Linux these files are installed to /opt/intel/composer_xe_2013_sp1.X.XXX/compiler/lib/mic/ by default. On Windows the default installation path is C:\Program Files (x86)\Intel\Composer XE 2013 SP1.XXX\compiler\lib\mic. The following libraries will typically be required for native applications. Your application may depend on other shared libraries based on your application requirements.

Linux LibraryDescription® Cilk™ Plus runtime library Intel-specific Fortran run-time library and POSIX support
libimf.soMath library support libraries for CPU dispatch, intel_fast_*, and traceback support routines
libiomp5.soCompatibility OpenMP* dynamic runtime library
libsvml.soShort vector math library
libirng.soRandom number generator library,,
Intel® MPI runtime libraries
libicaf.soCoarray Fortran library


Using micnativeloadex

The micnativeloadex utility, when used with option -l, will list shared library dependency information. The utility uses a default path, defined by the environment variable SINK_LD_LIBRARY_PATH, to search for dependencies.

1. From a console, set the SINK_LD_LIBRARY_PATH to the location of the Intel compiler runtime libraries for Intel Xeon Phi coprocessors and to the location of any other dynamic libraries required by the application.

$ export SINK_LD_LIBRARY_PATH=/opt/intel/composer_xe_2013_sp1.X.XXX/compiler/lib/mic/:/home/user1/myproject
> export SINK_LD_LIBRARY_PATH=C:\Program Files(x86)\Intel\Composer XE 2013 SP1.XXX\compiler\lib\mic:C:\Users\myname\Documents\myproject

2. Run the utility with the -l option.

$ /opt/intel/mic/bin/micnativeloadex a.out -l

3. View the list of dependencies.

Dependency information for a.out

        Full path was resolved as

        Binary was built for Intel(R) Xeon Phi(TM) Coprocessor
        (codename: Knights Corner) architecture

        SINK_LD_LIBRARY_PATH = /opt/intel/composer_xe_2013_sp1.0.080/compiler/lib/mic/:/home/user1

        Dependencies Found:

        Dependencies Not Found Locally (but may exist already on the coprocessor):
The list will include GNU C Library, GLIBC, runtime libraries including and GLIBC libraries are readily available on the coprocessor and do not need to be uploaded.

Transferring Files

The embedded Linux operating system, running on Intel Xeon Phi coprocessors, supports communication with the host via standard networking tools. To run an application directly on an Intel Xeon Phi coprocessor, transfer the application and any dependencies using SSH and SCP.

By default, the driver installation configures a network interface and alias for each Intel Xeon Phi coprocessor so that developers can refer to a coprocessor by “name” or by static IP address. For example, the default configuration will associate the name “mic0” with the first card in the system, the name “mic1” with the second card and so forth.

The instructions below show how to transfer files to the first card in the system, mic0.

1. Connect to the card using ssh and create a folder in the /tmp directory.

$ ssh mic0 ‘mkdir /tmp/myname'

2. Copy any dependencies, e.g., OpenMP* and Intel® Cilk™ SDK runtime libraries and any custom shared objects to the folder you created.

$ scp ./a.out mic0:/tmp/myname
$ scp /opt/intel/composer_xe_2013_sp1.0.080/compiler/lib/mic/ mic0:/tmp/myname

IP addresses for accessing Intel Xeon Phi coprocessors are preset to a default value in the private network range.

Handling Input and Output

The local file system on an Intel Xeon Phi coprocessor is a RAM disk in the GDDR5 memory. This means that any file saved to the local file system will take memory away from the native application. A good method for handling input and output of large data sets is to mount a folder from the host file system to the coprocessor and access the data from there. Super user permissions are required to mount a folder exported from the host via the Linux Network File System (NFS). This example provides access from /mydir to the first coprocessor in the system.

1. Create or identify the host directory you want to export. Ensure its permissions are readable and writeable.

$ chmod –R 777 /mydir

2. Modify /etc/exports on the host to permit export of /mydir to card 0. Append the following text to the file /etc/exports:


3. Modify /etc/hosts.allow on the host to give card 0 access to the host. Add this line to /etc/hosts.allow:


4. Start exportfs on the host to let NFS know the files have changed.

$ /usr/sbin/exportfs -a -v

5. Restart the NFS service on the host.

$ chkconfig nfs on
$ service nfs restart

6. Use ssh to log in to the coprocessor and use vi to modify /etc/fstab to recognize the exported file system. Append the following line to the /etc/fstab file:

host:/mydir /mydir nfs rsize=8192,wsize=8192,nolock,intr 0 0

7. Create the /mydir directory on card 0 and run the mount command.

# mkdir /mydir
# mount –a

8. Verify that the folder is available on card 0.

# df –h
Filesystem                Size      Used Available Use% Mounted on
tmpfs                     3.8G         0      3.8G   0% /dev/shm
host:/mydir             217.4G     25.2G    181.2G  12% /mydir

Running the Application

Developers can choose from two methods to run a native application.

  • Manually copy the application and its dependent files to the coprocessor, login and then invoke the program.
  • Use the micnativeloadex utility to automatically copy dependent files and run the program on the coprocessor.

Running the Application Manually

After copying the application and its dependencies to the coprocessor, log in directly to the card via console, set any required environment variables, and then run the application. The user can log in using ssh. Follow the steps below to connect to the first coprocessor in the system (card 0).

1. From a console, use ssh to connect to mic0.

$ ssh mic0
# ls /
bin      etc      lib      linuxrc  proc     sbin     tmp      var
dev      home     lib64    oldroot  root     sys      usr

2. Change to the directory that contains the native application.

# cd /tmp/myname

3. Ensure the files are executable and configure the runtime environment.

# chmod +x *
# export LD_LIBRARY_PATH=/tmp/myname:$LD_LIBRARY_PATH

4. Run the application.

# ./a.out

Running the Application with micnativeloadex

The micnativeloadex utility copies a native binary and its library dependencies to a specified coprocessor and executes it. Again, this utility uses a default path, defined by the environment variable SINK_LD_LIBRARY_PATH, to search for dependencies. By default, micnativeloadex redirects output from the application running on the coprocessor back to the local console. Note that the micnativeloadex utility is not intended for use when measuring application performance.

1. From a console, set the SINK_LD_LIBRARY_PATH to the location of the Intel compiler runtime libraries for Intel Xeon Phi coprocessors.

$ export SINK_LD_LIBRARY_PATH=/opt/intel/composer_xe_2013_sp1.0.080/compiler/lib/mic/

2. Run the application.

$ /opt/intel/mic/bin/micnativeloadex a.out


For information specific to debugging with GDB, please refer to the article Intel Xeon Phi Product Family: The GNU* Project Debugger (GDB).

The Intel® Debugger (IDB) is deprecated in the current Intel Composer XE 2013 SP 1 release. IDB provides command-line debugging for applications that run natively on Intel Xeon Phi coprocessors. The debugging process is analogous to running a native application on the coprocessor.

  1. Compile a debug build of the native application with the option –g.
  2. Upload the debug build to the coprocessor.
  3. Launch the target debugger on the debug host, connect to the card and begin debugging the application. A debug agent that handles debug communication will be downloaded to the coprocessor automatically. Alternatively, you can start the native application directly from the Intel Debugger.

Refer to the document Debugging Intel® MIC Applications on the Command Line for detailed information on how to debug native applications with the Intel Debugger.

Additional Resources




Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to:

Intel, the Intel logo, VTune, Cilk and Xeon are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others
Copyright© 2012 Intel Corporation. All rights reserved.

Optimization Notice

For more complete information about compiler optimizations, see our Optimization Notice.


Dailey, Whitman's picture

I'm attempting to compile a Fortran coarray application for native MIC execution but I'm unable to locate a couple of supporting libraries on my system. I am compiling in Windows with ifort and Parallel Studio 2016 Cluster Edition Update 3.  I can compile successfully for the MIC without specifying the -coarray option (ifort /Qmic micCAF.f90 -o micCAF.mic):  This produces an executable that successfully runs a single instance on the coprocessor.  However, when I add the coarray compiler option (ifort /Qmic -coarray micCAF.f90 -o micCAF.mic) the compiler fails with:

k1om-mpss-linux-ld.exe: cannot find -licaf
k1om-mpss-linux-ld.exe: cannot find -lmpi_mt

I've been unable to find what I think is the necessary *.so file:  I've checked multiple installations of the product (2015 and 2016 versions).  Is there source code that can be compiled into the *.so file? Or some way to create the library from the libicaf.dll that is available as part of the Parallel Studio 2016 installation?



Niraj T.'s picture

Hi, I am tryinng to build a dependency for the framework OpenSees. When I run "make", it runs and gives this error.

[nirajt@ycn1 ParMetis]$ make
(cd METISLib ; make )
make[1]: Entering directory `/home/internal/csm/nirajt/OpenSeesMP_mi/Parallel/apps/ParMetis/METISLib'
make[1]: `../libmetis.a' is up to date.
make[1]: Leaving directory `/home/internal/csm/nirajt/OpenSeesMP_mi/Parallel/apps/ParMetis/METISLib'
(cd ParMETISLib ; make )
make[1]: Entering directory `/home/internal/csm/nirajt/OpenSeesMP_mi/Parallel/apps/ParMetis/ParMETISLib'
make[1]: `../libparmetis.a' is up to date.
make[1]: Leaving directory `/home/internal/csm/nirajt/OpenSeesMP_mi/Parallel/apps/ParMetis/ParMETISLib'
(cd Programs ; make )
make[1]: Entering directory `/home/internal/csm/nirajt/OpenSeesMP_mi/Parallel/apps/ParMetis/Programs'
/home/opt/ICS-2013.1.039-intel64/impi/  -o ../Graphs/ptest3.2.0 ptest.o io.o adaptgraph.o  -L..   -lparmetis -lmetis  -lm
ipo: warning #11010: file format not recognized for ptest.o
ipo: warning #11010: file format not recognized for io.o
ipo: warning #11010: file format not recognized for adaptgraph.o
ipo: warning #11010: file format not recognized for /opt/intel/impi/
ipo: warning #11010: file format not recognized for /opt/intel/impi/
ld: ptest.o: Relocations in generic ELF (EM: 181)
ld: ptest.o: Relocations in generic ELF (EM: 181)
ptest.o: could not read symbols: File in wrong format
make[1]: *** [../Graphs/ptest3.2.0] Error 1
make[1]: Leaving directory `/home/internal/csm/nirajt/OpenSeesMP_mi/Parallel/apps/ParMetis/Programs'
make: *** [default] Error 2

I am compiling it for Xeon Phi and using -mmic as CFLAG. Coud anyone suggest me what needs to be done as there is no solution for this error anywhere.

following is the of the dependency Parmetis which I am trying to compile for Xeon Phi(mic)]

# Which compiler to use
CC = /home/opt/ICS-2013.1.039-intel64/impi/


# What optimization level to use

# Include directories for the compiler
INCDIR =  /home/opt/ICS-2013.1.039-intel64/impi/

# What options to be used by the compiler

# Which loader to use
LD = /home/opt/ICS-2013.1.039-intel64/impi/

# In which directories to look for any additional libraries

# What additional libraries to link the programs with (eg., -lmpi)
#XTRALIBS = -lefence
#XTRALIBS = -ldmalloc

# What archiving to use
AR = ar rv

# What to use for indexing the archive
#RANLIB = ranlib
RANLIB = ar -ts

VERNUM = 3.2.0

John F.'s picture

A good method for handling input and output of large data sets is to mount a folder from the host file system to the coprocessor and access the data from there.

Could you please comment on the expected bandwidth in such a setup? When I mount a directory from the host via NFS I get about 40 MB/s which is certainly not enough to process large data sets.

Hayder A.'s picture

Hi Amanda,

I would like to build new programming model for MIC from scratch that leverage the hardware resources. So, I was thinking to implement on native model.

Could you please give me details or tutorial about building application on native model? 

Thanks in advance.

I am looking to hearing you.


lcicca's picture

Hi Amanda. Thanks a lot for the explanation. The online material is a lot so it's good to be redirected to the right one. Thanks again.

AmandaS (Intel)'s picture

Depending on the code, it's to be expected given that an Intel Xeon Phi coprocessor core is a lower frequency, in-order core with a smaller cache size. Also note that each coprocessor physical core has four logical cores so your code needs to scale to more than 100 threads, in addition to being vectorized.
See this PDF for a more detailed explanation:
Sorry if I keep pointing you to the online articles, but they really are helpful and they provide much more information than I can type into this little comment box. :)

lcicca's picture

Hi Amanda. Thanks for your comment. I know my application has not been vectorized but what I was doing was comparing non-vectorized code on Xeon vs non-vectorized code on Phi and the Phi code runs 10 time slower than the Xeon one. Both of them are running same amount of threads so I was thinking that running that code on Phi would have taken the advantage of having more cores but it's not actually happening. Apart from the multi-threading is that something expected if you run just not-vectorized code on Phi?

AmandaS (Intel)'s picture

Sounds like you still have work to do to vectorize and optimize the code. Remember you now have extensive HW resources so the application needs to take advantage of the vector units and cores. We have a set of docs devoted entirely to getting good performance on Intel Xeon Phi located here:

lcicca's picture

Hi all,
I'm quite new on Phi programming and I'm experiencing some problem in running my application on it. I have a quite muti-threaded and parallel application written on Xeon and optimized using SSE3 and SSE4.1. As I understood from the Phi documentation those are not supported on the Phi so I have compiled the same application using pure C code both on Xeon (host) and natively on the Phi. The multi-threading has been developed using pthread. Compilation for the Phi (using -mmic) went straight but after uploading the code on the MIC it runs 10 times slower than the same code running on Xeon (both not using any SSE). Am I missing something when compiling the code for the Phi maybe? I have followed the very clear procedure on this web page. Thanks in advance. lcicca.

I'm using Intel C++ compiler 2013 with these options

icc -O3 -mmic -c myfile.c

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.