Building a Native Application for Intel® Xeon Phi™ Coprocessors

Introduction

Some applications are well suited for running directly on Intel® Xeon Phi™ coprocessors without offload from a host system. This is also known as running in “native mode.” The purpose of this article is to describe how to build a native application that runs directly on an Intel Xeon Phi coprocessor and its embedded Linux* operating system. A summary of the steps is below.

  1. Determine if the application is suitable for native execution.
  2. Compile the application for native execution.
  3. Build required libraries for native execution.
  4. Copy the executable and any dependencies, such as runtime libraries, to the target hardware.
  5. Mount file shares to the target hardware for accessing input data sets and saving output data sets.
  6. Connect to the target hardware via console, set up the environment, and run the application.
  7. Debug by invoking the native application from the native debugger via a debug server running on the target.

Information on how to use basic Linux* services is not included in this document. Some commands used in these instructions require root permissions.

This document assumes you are using the compilers available in the Intel® Composer XE 2013 Update 2 package. Licensed users of the Intel® compilers may download the most recent versions of the compiler from the Intel® Software Development Products Registration Center. Evaluation versions of the Intel® compilers may be found at the Intel® Software Evaluation Center.

Super user permissions are required to configure non-root SSH and SCP access to Intel Xeon Phi coprocessors (not covered in this document).

Suitability for Native Execution

Native execution occurs when an application runs entirely on an Intel Xeon Phi coprocessor. Building a native application is a fast way to get existing software running with minimal code changes. First, ensure that the application is suitable for native execution. Data parallelism, usage of parallel algorithms, and application scalability are criteria for targeting Intel Xeon Phi coprocessors, but not for distinguishing between the usage of offload or native mode. An application likely to benefit from the large number of cores available with native execution tends to have the following characteristics.

  • A modest memory footprint, less than the available physical memory on the device
  • Very few serial segments
  • Does not perform extensive I/O
  • A complex code structure with no well-identified hot kernels that could be offloaded without substantial data transfer overhead

Just as for offload mode, additional software optimizations, especially vectorization, are likely to be needed to achieve good performance.

Compiling a Native Application

The Intel® C++ Compiler and Intel® Fortran Compiler support the cross compilation of code for Intel Xeon Phi coprocessors. The compiler option –mmic enables the cross compiler to generate an object file (a.out by default) that runs only on Intel Xeon Phi coprocessors. Follow the steps below to compile an application for native execution.

1. The Intel compilers rely on environment variables to function properly. First, execute the setup script to configure the Linux runtime environment. Use compilervars.csh for C shell and use compilervars.sh for Bash shell. The following example runs the script in Bash shell from the default installation location.

$ source /opt/intel/composer_xe_2013/bin/compilervars.sh intel64

2. Verify that the environment is set correctly by running icc –V (C compiler) or icpc –V (C++ compiler) or ifort –V (Fortran compiler).

$ icc –V
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.0.146 Build 20130121
Copyright (C) 1985-2013 Intel Corporation. All rights reserved.

3. Invoke the compiler and include the –mmic option in the build line.

$ icc –mmic mycode.c
$ icpc –mmic mycode.cpp
$ ifort –mmic mycode.f90

Note: The default optimization level for the Intel compilers is O2 when no O option is used. For more information on supported compiler options, refer to the Intel® Compiler XE 13.0 Update 2 User and Reference Guides located in /opt/intel/composer_xe_2013/Documentation/en_US/.

Building Libraries

When building a code to run natively on Intel Xeon Phi coprocessors, developers may need to build libraries for native execution and then link to these libraries when building their application to run in native mode. Follow the standard method for creating Linux shared and static libraries, but also include the –mmic option to generate code for Intel Xeon Phi coprocessors.

Shared Libraries

The steps below illustrate how to create a Linux shared object for native execution using the Intel compilers.

1. Compile the library source code. This will create mylib.o by default.

$ icc -mmic -c -fpic mylib.c
$ icpc –mmic –c –fpic mylib.cpp
$ ifort –mmic –c –fpic mylib.f90

2. Use the compiler -shared option to create the library file from the object file(s).

$ icc -mmic -shared -o libmylib.so mylib.o
$ icpc –mmic –shared –o libmylib.so mylib.o
$ ifort -mmic -shared -o libmylib.so mylib.o

3. Compile and link the native application code with the native shared object.

$ icc -mmic main.c libmylib.so
$ icpc –mmic main.cpp libmylib.so
$ ifort -mmic main.f90 libmylib.so

In addition to transferring the native application and the shared object to the coprocessor, you will also need to transfer shared objects that are required by the compiler runtime system. These files are installed to /opt/intel/composer_xe_2013.2.146/compiler/lib/mic/ by default. The following libraries will typically be required for native applications. Your application may depend on other shared libraries based on your application requirements.

Library Description
libcilkrts.so.5 Intel® Cilk™ Plus runtime library
libifcoremt.so.5 Thread-safe Intel-specific Fortran run-time library
libifport.so.5 Portability and POSIX support
libimf.so Math library
libintlc.so.5 Intel support libraries for CPU dispatch, intel_fast_*, and traceback support routines
libiomp5.so Compatibility OpenMP* dynamic runtime library
libsvml.so Short vector math library
libirng.so Random number generator library
libmpi.so.4.0,
libmpigf.so.4.0,
libmpigc4.so.4.0
Intel® MPI runtime libraries

To determine which Intel runtime objects to upload to the card, run the following command:

$ readelf -d libmylib.so | grep -i NEED
 0x0000000000000001 (NEEDED)             Shared library: [libimf.so]
 0x0000000000000001 (NEEDED)             Shared library: [libsvml.so]
 0x0000000000000001 (NEEDED)             Shared library: [libirng.so]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libintlc.so.5]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x000000006ffffffe (VERNEED)            0x3a8
 0x000000006fffffff (VERNEEDNUM)         1

The list will include GNU C Library, GLIBC, runtime libraries including libm.so, libstdc++.so, libgcc_s.so, libpthread.so, libc.so and libdl.so. GLIBC libraries are readily available on the coprocessor and do not need to be uploaded.

Static Libraries

Developers can use xiar to create native static libraries.

To build a static library, do the following:

1. Compile the library source code. This will create mylib.o by default.

$ icc -mmic -c mylib.c
$ icpc –mmic –c mylib.cpp
$ ifort –mmic –c mylib.f90

2. Use the archive utility to create the library file from the object file(s).

$ xiar crs libmylib.a mylib.o

3. Compile and link the native application code with the native static library.

$ icc -mmic main.c libmylib.a
$ icpc –mmic main.cpp libmylib.a
$ ifort -mmic main.f90 libmylib.a

Transferring Files

The embedded Linux operating system, running on Intel Xeon Phi coprocessors, supports communication with the host via standard networking tools. To run an application directly on an Intel Xeon Phi coprocessor, transfer the application and any dependencies using SSH and SCP.

By default, the driver installation configures a network interface and alias for each Intel Xeon Phi coprocessor so that developers can refer to a coprocessor by “name” or by static IP address. For example, the default configuration will associate the name “mic0” with the first card in the system, the name “mic1” with the second card and so forth.

The instructions below show how to transfer files to the first card in the system, mic0.

1. Connect to the card using ssh and create a folder in the /tmp directory.

$ ssh mic0 ‘mkdir /tmp/myname'

2. Copy any dependencies, e.g., OpenMP* and Intel® Cilk™ SDK runtime libraries and any custom shared objects to the folder you created.

$ scp ./a.out mic0:/tmp/myname
$ scp /opt/intel/composer_xe_2013.2.146/compiler/lib/mic/libiomp5.so mic0:/tmp/myname

IP addresses for accessing Intel Xeon Phi coprocessors are preset to a default value in the private network range.

Handling Input and Output

The local file system on an Intel Xeon Phi coprocessor is a RAM disk in the GDDR5 memory. This means that any file saved to the local file system will take memory away from the native application. A good method for handling input and output of large data sets is to mount a folder from the host file system to the coprocessor and access the data from there. Super user permissions are required to mount a folder exported from the host via the Linux Network File System (NFS). This example provides access from /mydir to the first coprocessor in the system.

1. Create or identify the host directory you want to export. Ensure its permissions are readable and writeable.

$ chmod –R 777 /mydir

2. Modify /etc/exports on the host to permit export of /mydir to card 0. Append the following text to the file /etc/exports:

/mydir 172.31.1.1/24(rw,no_root_squash)

3. Modify /etc/hosts.allow on the host to give card 0 access to the host. Add this line to /etc/hosts.allow:

ALL: 172.31.1.1

4. Start exportfs on the host to let NFS know the files have changed.

$ /usr/sbin/exportfs -a –v
exporting 172.31.1.1/24:/mydir

5. Restart the NFS service on the host.

$ chkconfig nfs on
$ service nfs restart

6. Use ssh to log in to the coprocessor and use vi to modify /etc/fstab to recognize the exported file system. Append the following line to the /etc/fstab file:

host:/mydir /mydir nfs rsize=8192,wsize=8192,nolock,intr 0 0

7. Create the /mydir directory on card 0 and run the mount command.

# mkdir /mydir
# mount –a

8. Verify that the folder is available on card 0.

# df –h
Filesystem                Size      Used Available Use% Mounted on
tmpfs                     3.8G         0      3.8G   0% /dev/shm
host:/mydir             217.4G     25.2G    181.2G  12% /mydir

Running the Application

Developers can choose from two methods to run a native application.

  • Manually copy the application and its dependent files to the coprocessor, login and then invoke the program.
  • Use the micnativeloadex utility to automatically copy dependent files and run the program on the coprocessor.

Running the Application Manually

After copying the application and its dependencies to the coprocessor, log in directly to the card via console, set any required environment variables, and then run the application. The user can log in using ssh. Follow the steps below to connect to the first coprocessor in the system (card 0).

1. From a console, use ssh to connect to mic0.

$ ssh mic0
# ls /
bin      etc      lib      linuxrc  proc     sbin     tmp      var
dev      home     lib64    oldroot  root     sys      usr
# 

2. Change to the directory that contains the native application.

# cd /tmp/myname

3. Ensure the files are executable and configure the runtime environment.

# chmod +x *
# export LD_LIBRARY_PATH=/tmp/myname:$LD_LIBRARY_PATH

4. Run the application.

# ./a.out

Using micnativeloadex

The micnativeloadex utility copies a native binary and its library dependencies to a specified coprocessor and executes it. The utility uses a default path, defined by the environment variable SINK_LD_LIBRARY_PATH, to search for dependencies. By default, micnativeloadex redirects output from the application running on the coprocessor back to the local console.

1. From a console, set the SINK_LD_LIBRARY_PATH to the location of the Intel compiler runtime libraries for Intel Xeon Phi coprocessors.

$ export SINK_LD_LIBRARY_PATH=/opt/intel/composer_xe_2013.2.146/compiler/lib/mic/

2. Run the application.

$ /opt/intel/mic/bin/micnativeloadex a.out

Debugging

The Intel® Debugger provides command-line debugging for applications that run natively on Intel Xeon Phi coprocessors. The debugging process is analogous to running a native application on the coprocessor.

  1. Compile a debug build of the native application with the option –g.
  2. Upload the debug build to the coprocessor.
  3. Launch the target debugger on the debug host, connect to the card and begin debugging the application. A debug agent that handles debug communication will be downloaded to the coprocessor automatically. Alternatively, you can start the native application directly from the Intel Debugger.

Refer to the document Debugging Intel® MIC Applications on the Command Line for detailed information on how to debug native applications with the Intel Debugger.

Additionally a GDB* native-only debugger preview install is available at http://software.intel.com/en-us/forums/showthread.php?t=105443.

A Simple Example

Download the file cube_charge.tar from download at http://software.intel.com/sites/default/files/cube_charge.tar and extract it.

This example application performs a three-dimensional integral to calculate the electrostatic potential due to a uniform charge distribution over a cube at a series of space points that are read in from a data file. Use the macro NP to specify the number of data points to be calculated.

Change to the example project directory.

1. Open a terminal and set up the compiler environment.

$ source /opt/intel/composer_xe_2013/bin/compilervars.sh intel64

2. Verify that the compiler environment is set.

$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.0.146 Build 20120121
Copyright (C) 1985-2013 Intel Corporation. All rights reserved.

3. Build the example. Note the use of –mmic and –openmp.

$ ifort -DNP=120 -mmic -openmp -openmp-report1 -ipo -vec-report1 -fpp cube_charge.f90 threed_int.f90 twod_int.f90 trap_int.f90 func.f90

The output looks like this:

ipo: remark #11000: performing multi-file optimizations
ipo: remark #11006: generating object file /tmp/ipo_ifortqS9Myh.o
cube_charge.f90(118): (col. 7) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.
cube_charge.f90(110): (col. 7) remark: OpenMP DEFINED REGION WAS PARALLELIZED.
trap_int.f90(35): (col. 22) remark: LOOP WAS VECTORIZED.

4. Transfer the program and any dependencies from the host to the coprocessor:

$ ssh mic0 ‘mkdir /tmp/myname’
$ scp ./a.out mic0:/tmp/myname
$ scp /opt/intel/composer_xe_2013.2.146/compiler/lib/mic/libiomp5.so mic0:/tmp/myname

5. Use SSH to connect to the coprocessor. Ensure the files can be executed, set the environment and run the executable:

# export LD_LIBRARY_PATH=/tmp/myname:$LD_LIBRARY_PATH
# ./a.out

6. Instead of doing steps 4 and 5 above, use the micnativeloadex utility to transfer dependencies from the host to the coprocessor and run the executable:

$ export SINK_LD_LIBRARY_PATH=/opt/intel/composer_xe_2013.1.117/compiler/lib/mic/
$ /opt/intel/mic/bin/micnativeloadex a.out

About the Author

Amanda Sharp is a Technical Consulting Engineer with the Intel Compiler team, focusing on software optimization. She provides technical support and training to software developers in High Performance Computing. Amanda has a B.S. in Computer Science from Portland State University.

Additional Resources

Notices

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Intel, the Intel logo, VTune, Cilk and Xeon are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others
Copyright© 2012 Intel Corporation. All rights reserved.

Optimization Notice

http://software.intel.com/en-us/articles/optimization-notice/

Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.