Building a Native Application for Intel® Xeon Phi™ Coprocessors

Introduction

Some applications are well suited for running directly on Intel® Xeon Phi™ coprocessors without offload from a host system. This is also known as running in “native mode.” The purpose of this article is to describe how to build a native application that runs directly on an Intel Xeon Phi coprocessor and its embedded Linux* operating system. A summary of the steps is below.

  1. Determine if the application is suitable for native execution.
  2. Compile the application for native execution.
  3. Build required libraries for native execution.
  4. Copy the executable and any dependencies, such as runtime libraries, to the target hardware.
  5. Mount file shares to the target hardware for accessing input data sets and saving output data sets.
  6. Connect to the target hardware via console, set up the environment, and run the application.
  7. Debug by invoking the native application from the native debugger via a debug server running on the target.

Information on how to use basic Linux* services is not included in this document. Some commands used in these instructions require root permissions.

This document assumes you are using the compilers available in the Intel® Composer XE 2013 SP 1 package. Licensed users of the Intel® compilers may download the most recent versions of the compiler from the Intel® Software Development Products Registration Center. Evaluation versions of the Intel® compilers may be found at the Intel® Software Evaluation Center.

Super user permissions are required to configure non-root SSH and SCP access to Intel Xeon Phi coprocessors (not covered in this document).

Suitability for Native Execution

Native execution occurs when an application runs entirely on an Intel Xeon Phi coprocessor. Building a native application is a fast way to get existing software running with minimal code changes. First, ensure that the application is suitable for native execution. Data parallelism, usage of parallel algorithms, and application scalability are criteria for targeting Intel Xeon Phi coprocessors, but not for distinguishing between the usage of offload or native mode. An application likely to benefit from the large number of cores available with native execution tends to have the following characteristics.

  • A modest memory footprint, less than the available physical memory on the device
  • Very few serial segments
  • Does not perform extensive I/O
  • A complex code structure with no well-identified hot kernels that could be offloaded without substantial data transfer overhead

Just as for offload mode, additional software optimizations, especially vectorization, are likely to be needed to achieve good performance.

Compiling a Native Application

The Intel® C++ Compiler and Intel® Fortran Compiler support the cross compilation of code for Intel Xeon Phi coprocessors. The compiler option –mmic (Linux*) and /Qmic (Windows*) enables the cross compiler to generate an application that runs only on Intel Xeon Phi coprocessors. Follow the steps below to compile an application for native execution.

1. The Intel compilers rely on environment variables to function properly. First, execute the setup script to configure the Linux runtime environment. Use compilervars.csh for C shell and use compilervars.sh for Bash shell. The following example runs the script in Bash shell from the default installation location.

$ source /opt/intel/composer_xe_2013_sp1/bin/compilervars.sh intel64

On Windows,from the Start menu, select All Programs > Intel Parallel Studio XE 2013 > Command Prompt > Parallel Studio XE with Intel Compiler XE and select Intel 64 mode.

2. Verify that the environment is set correctly by running icc –V (C++ compiler) or ifort –V (Fortran compiler) on Linux. On Windows use icl /QV (C++ compiler or ifort /QV (Fortran compiler).

$ icc –V
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.1.094 Build 20130905
Copyright (C) 1985-2013 Intel Corporation. All rights reserved.

3. Invoke the compiler and include the –mmic or /Qmic option in the build line.

$ icc –mmic mycode.c
> ifort /Qmic mycode.f90

Note: The default optimization level for the Intel compilers is O2 when no On option is used.

Building Libraries

When building a code to run natively on Intel Xeon Phi coprocessors, developers may need to build libraries for native execution and then link to these libraries when building their application to run in native mode. Follow the standard method for creating Linux shared and static libraries, but also include the –mmic or /Qmic option to generate code for Intel Xeon Phi coprocessors.

Shared Libraries

The steps below illustrate how to create a Linux shared object for native execution using the Intel compilers.

1. Compile the library source code. This will create mylib.o by default.

$ icc -mmic -c -fpic mylib.c
> ifort /Qmic –c –fpic mylib.f90

2. Use the compiler -shared option to create the library file from the object file(s).

$ icc -mmic -shared -o libmylib.so mylib.o
> ifort /Qmic -shared -o libmylib.so mylib.o

3. Compile and link the native application code with the native shared object.

$ icc -mmic main.c libmylib.so
> ifort /Qmic main.f90 libmylib.so 

 

Static Libraries

Developers can use xiar to create native static libraries.

To build a static library, do the following:

1. Compile the library source code. This will create mylib.o by default.

$ icc -mmic -c mylib.c
> ifort /Qmic –c mylib.f90

2. Use the archive utility to create the library file from the object file(s).

$ xiar crs libmylib.a mylib.o

3. Compile and link the native application code with the native static library.

$ icc -mmic main.c libmylib.a
> ifort /Qmic main.f90 libmylib.a
 

Finding Dependencies

In addition to transferring the native application and the shared object to the coprocessor, you will also need to transfer shared objects that are required by the compiler runtime system. On Linux these files are installed to /opt/intel/composer_xe_2013_sp1.X.XXX/compiler/lib/mic/ by default. On Windows the default installation path is C:\Program Files (x86)\Intel\Composer XE 2013 SP1.XXX\compiler\lib\mic. The following libraries will typically be required for native applications. Your application may depend on other shared libraries based on your application requirements.

Linux Library Description
libcilkrts.so.5 Intel® Cilk™ Plus runtime library
libifcoremt.so.5 Thread-safe Intel-specific Fortran run-time library
libifport.so.5 Portability and POSIX support
libimf.so Math library
libintlc.so.5 Intel support libraries for CPU dispatch, intel_fast_*, and traceback support routines
libiomp5.so Compatibility OpenMP* dynamic runtime library
libsvml.so Short vector math library
libirng.so Random number generator library
libmpi.so.4.0,
libmpigf.so.4.0, 
libmpigc4.so.4.0
Intel® MPI runtime libraries
libicaf.so Coarray Fortran library

 

Using micnativeloadex

The micnativeloadex utility, when used with option -l, will list shared library dependency information. The utility uses a default path, defined by the environment variable SINK_LD_LIBRARY_PATH, to search for dependencies.

1. From a console, set the SINK_LD_LIBRARY_PATH to the location of the Intel compiler runtime libraries for Intel Xeon Phi coprocessors and to the location of any other dynamic libraries required by the application.

$ export SINK_LD_LIBRARY_PATH=/opt/intel/composer_xe_2013_sp1.X.XXX/compiler/lib/mic/:/home/user1/myproject
> export SINK_LD_LIBRARY_PATH=C:\Program Files(x86)\Intel\Composer XE 2013 SP1.XXX\compiler\lib\mic:C:\Users\myname\Documents\myproject

2. Run the utility with the -l option.

$ /opt/intel/mic/bin/micnativeloadex a.out -l

3. View the list of dependencies.

Dependency information for a.out

        Full path was resolved as
        /home/user1/a.out

        Binary was built for Intel(R) Xeon Phi(TM) Coprocessor
        (codename: Knights Corner) architecture

        SINK_LD_LIBRARY_PATH = /opt/intel/composer_xe_2013_sp1.0.080/compiler/lib/mic/:/home/user1

        Dependencies Found:
                /opt/intel/composer_xe_2013_sp1.0.080/compiler/lib/mic/libimf.so
                /opt/intel/composer_xe_2013_sp1.0.080/compiler/lib/mic/libsvml.so
                /opt/intel/composer_xe_2013_sp1.0.080/compiler/lib/mic/libintlc.so.5
                /opt/intel/composer_xe_2013_sp1.0.080/compiler/lib/mic/libirng.so
                /home/user1/libmylib.so

        Dependencies Not Found Locally (but may exist already on the coprocessor):
                libc.so.6
                libm.so.6
                libgcc_s.so.1
                libdl.so.2 
The list will include GNU C Library, GLIBC, runtime libraries including libm.solibstdc++.solibgcc_s.solibpthread.solibc.so and libdl.so. GLIBC libraries are readily available on the coprocessor and do not need to be uploaded.
 

Transferring Files

The embedded Linux operating system, running on Intel Xeon Phi coprocessors, supports communication with the host via standard networking tools. To run an application directly on an Intel Xeon Phi coprocessor, transfer the application and any dependencies using SSH and SCP.

By default, the driver installation configures a network interface and alias for each Intel Xeon Phi coprocessor so that developers can refer to a coprocessor by “name” or by static IP address. For example, the default configuration will associate the name “mic0” with the first card in the system, the name “mic1” with the second card and so forth.

The instructions below show how to transfer files to the first card in the system, mic0.

1. Connect to the card using ssh and create a folder in the /tmp directory.

$ ssh mic0 ‘mkdir /tmp/myname'

2. Copy any dependencies, e.g., OpenMP* and Intel® Cilk™ SDK runtime libraries and any custom shared objects to the folder you created.

$ scp ./a.out mic0:/tmp/myname
$ scp /opt/intel/composer_xe_2013_sp1.0.080/compiler/lib/mic/libiomp5.so mic0:/tmp/myname

IP addresses for accessing Intel Xeon Phi coprocessors are preset to a default value in the private network range.

Handling Input and Output

The local file system on an Intel Xeon Phi coprocessor is a RAM disk in the GDDR5 memory. This means that any file saved to the local file system will take memory away from the native application. A good method for handling input and output of large data sets is to mount a folder from the host file system to the coprocessor and access the data from there. Super user permissions are required to mount a folder exported from the host via the Linux Network File System (NFS). This example provides access from /mydir to the first coprocessor in the system.

1. Create or identify the host directory you want to export. Ensure its permissions are readable and writeable.

$ chmod –R 777 /mydir

2. Modify /etc/exports on the host to permit export of /mydir to card 0. Append the following text to the file /etc/exports:

/mydir 172.31.1.1/24(rw,no_root_squash)

3. Modify /etc/hosts.allow on the host to give card 0 access to the host. Add this line to /etc/hosts.allow:

ALL: 172.31.1.1

4. Start exportfs on the host to let NFS know the files have changed.

$ /usr/sbin/exportfs -a -v
exporting 172.31.1.1/24:/mydir

5. Restart the NFS service on the host.

$ chkconfig nfs on
$ service nfs restart

6. Use ssh to log in to the coprocessor and use vi to modify /etc/fstab to recognize the exported file system. Append the following line to the /etc/fstab file:

host:/mydir /mydir nfs rsize=8192,wsize=8192,nolock,intr 0 0

7. Create the /mydir directory on card 0 and run the mount command.

# mkdir /mydir
# mount –a

8. Verify that the folder is available on card 0.

# df –h
Filesystem                Size      Used Available Use% Mounted on
tmpfs                     3.8G         0      3.8G   0% /dev/shm
host:/mydir             217.4G     25.2G    181.2G  12% /mydir

Running the Application

Developers can choose from two methods to run a native application.

  • Manually copy the application and its dependent files to the coprocessor, login and then invoke the program.
  • Use the micnativeloadex utility to automatically copy dependent files and run the program on the coprocessor.

Running the Application Manually

After copying the application and its dependencies to the coprocessor, log in directly to the card via console, set any required environment variables, and then run the application. The user can log in using ssh. Follow the steps below to connect to the first coprocessor in the system (card 0).

1. From a console, use ssh to connect to mic0.

$ ssh mic0
# ls /
bin      etc      lib      linuxrc  proc     sbin     tmp      var
dev      home     lib64    oldroot  root     sys      usr
# 

2. Change to the directory that contains the native application.

# cd /tmp/myname

3. Ensure the files are executable and configure the runtime environment.

# chmod +x *
# export LD_LIBRARY_PATH=/tmp/myname:$LD_LIBRARY_PATH

4. Run the application.

# ./a.out

Running the Application with micnativeloadex

The micnativeloadex utility copies a native binary and its library dependencies to a specified coprocessor and executes it. Again, this utility uses a default path, defined by the environment variable SINK_LD_LIBRARY_PATH, to search for dependencies. By default, micnativeloadex redirects output from the application running on the coprocessor back to the local console. Note that the micnativeloadex utility is not intended for use when measuring application performance.

1. From a console, set the SINK_LD_LIBRARY_PATH to the location of the Intel compiler runtime libraries for Intel Xeon Phi coprocessors.

$ export SINK_LD_LIBRARY_PATH=/opt/intel/composer_xe_2013_sp1.0.080/compiler/lib/mic/

2. Run the application.

$ /opt/intel/mic/bin/micnativeloadex a.out

Debugging

For information specific to debugging with GDB, please refer to the article Intel Xeon Phi Product Family: The GNU* Project Debugger (GDB).

The Intel® Debugger (IDB) is deprecated in the current Intel Composer XE 2013 SP 1 release. IDB provides command-line debugging for applications that run natively on Intel Xeon Phi coprocessors. The debugging process is analogous to running a native application on the coprocessor.

  1. Compile a debug build of the native application with the option –g.
  2. Upload the debug build to the coprocessor.
  3. Launch the target debugger on the debug host, connect to the card and begin debugging the application. A debug agent that handles debug communication will be downloaded to the coprocessor automatically. Alternatively, you can start the native application directly from the Intel Debugger.

Refer to the document Debugging Intel® MIC Applications on the Command Line for detailed information on how to debug native applications with the Intel Debugger.

Additional Resources

Notices

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Intel, the Intel logo, VTune, Cilk and Xeon are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others
Copyright© 2012 Intel Corporation. All rights reserved.

Optimization Notice

http://software.intel.com/en-us/articles/optimization-notice/

Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.