Build Application

Before you start identifying hotspots in your native Intel® Xeon Phi™ coprocessor application, do the following:

  1. Get software tools.

  2. Build application with full optimizations on the host.

  3. Create a performance baseline.

Get Software Tools

You need the following tools to try these tutorial steps yourself using the matrix sample application:

  • Intel® VTune™ Amplifier, including sample applications

  • sampling driver, set up during the VTune Amplifier installation


    If, for some reason, the VTune Amplifier was not able to install the driver, you will not be able to run the analysis and will see a warning message. See online help for additional instructions how to install the driver manually.

  • Intel Manycore Platform Software Stack. See Release Notes for more information.

  • zip file extraction utility

  • Intel C++ compiler installed on the host. See Release Notes for more information.

Acquire Intel VTune Amplifier

If you do not already have access to the VTune Amplifier, you can download an evaluation copy from

Install and Set Up VTune Amplifier Sample Applications

  1. Copy the file from the <install-dir>\samples\<locale>\C++\ directory to a writable directory or share on your system. The default installation path is C:\Program Files(x86)\Intel\VTune Amplifier XE <version>\ (on certain systems, instead of Program Files(x86) the directory name is Program Files).
  2. Extract the sample from the .zip file.


  • Samples are non-deterministic. Your screens may vary from the screen captures shown throughout this tutorial.
  • Samples are designed only to illustrate the VTune Amplifier features; they do not represent best practices for creating code.

Build the Target

Prerequisite: When using an offload or cross compiler, make sure to manually install binary utilities (Binutils) included into the Intel Xeon Phi installation zip file package. For installation instructions, please refer to the Intel Compiler documentation.

Build the target on the host with full optimizations, which is recommended for performance analysis. For native application analysis used in this tutorial, you need to copy the binary to the Intel Xeon Phi coprocessor. For offload applications, no copying is required.

To communicate with the Intel Xeon Phi coprocessor cards, you may use any of the following mechanisms:

  • Mount an NFS share. See the NFS Mounting a Host Export topic in the Intel Manycore Platform Software Stack (MPSS) help for details.
  • Use existing SSH tools.

This sample uses the PuTTy* utility and assumes that ordinary user account access has not yet been established on the coprocessor (thus the references to "root"). See the Intel MPSS installation document for details on creating user accounts on the coprocessor.

  1. From the Start menu, select All Programs > Intel Parallel Studio XE 2013 > Command Prompt > Parallel Studio XE with Intel Compiler XE and select a mode that corresponds to your system and version of the Microsoft* Visual Studio IDE installed, for example: Intel 64 Visual Studio 2012 mode.

    This command sets up the Intel compiler environment variables and opens a command prompt.

  2. Change the directory to the location where you extracted the sample code (for this example, assume that location is C:\samples\matrix\windows_mic).

  3. Build the sample code with the Intel C++ compiler, copy the matrix.mic file to the card via the pscp command and set the execution permissions via plink.

    For automation, run the buildmatrix.bat script file including all these commands.

    icl /Qmic -lpthread -g -O3 -debug inline-debug-info -vec-report3 -o matrix.mic ..\src\matrix.mic  ..\src\matrix.c ..\src\multiply.c ..\src\util.c -D_LINUX
    pscp -l root -i <path_to_private_key_file>  C:\samples\matrix\windows_mic\matrix.mic <target_system>:/tmp
    plink -l root -i <path_to_private_key_file> root@<target_system>  chmod "+x" /tmp/matrix.mic


    Edit the buildmatrix.bat file to add a path to the private key file on your system.

    The matrix application is built as matrix.mic and copied to the /tmp directory on the selected card.

Create a Performance Baseline

  1. Run the application on the coprocessor using plink and record the results to establish a performance baseline:

    plink -l root -i <path_to_private_key_file> root@<target_system>  /tmp/matrix.mic

    In this tutorial's scenario, the command running the matrix application is added to the runmatrix.bat. Edit this bat file to add a path to the private key file on your system.


    Before you start the application, minimize the amount of other software running on your computer to get more accurate results.

  2. Note the execution time displayed at the bottom. For the matrix.mic executable in the figure above, the execution time is 24.988 seconds. Use this metric as a baseline against which you will compare subsequent runs of the application.


    Run the application several times, noting the execution time for each run, and use the average time. This helps to minimize skewed results due to transient system activity.

Key Terms

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

For more complete information about compiler optimizations, see our Optimization Notice.