Build Application

Before you start identifying hotspots in your native Intel® Xeon Phi™ coprocessor application, do the following:

  1. Get software tools.

  2. Build application with full optimizations on the host.

  3. Create a performance baseline.

Get Software Tools

You need the following tools to try these tutorial steps yourself using the matrix sample application:

  • Intel® VTune™ Amplifier, including sample applications

  • Sampling driver, set up during the VTune Amplifier installation


    If, for some reason, the VTune Amplifier was not able to install the driver, you will not be able to run the analysis and will see a warning message. See online help for additional instructions how to install the driver manually.

  • Intel® Manycore Platform Software Stack (Intel® MPSS). See Release Notes for more information.

  • zip file extraction utility

  • Intel® C++ Compiler installed on the host. See Release Notes for more information.

Acquire Intel VTune Amplifier

If you do not already have access to the VTune Amplifier, you can download an evaluation copy from

Install and Set Up VTune Amplifier Sample Applications

  1. Copy the file from the <install-dir>\samples\<locale>\C++\ directory to a writable directory or share on your system.


    The default installation path for the VTune Amplifier XE is [Program Files]\IntelSWTools\VTune Amplifier XE <version>. For the VTune Amplifier for Systems, the default <install_dir> is [Program Files]\IntelSWTools\system_studio_<version>\VTune Amplifier for Systems.

  2. Extract the sample from the .zip file.


  • Samples are non-deterministic. Your screens may vary from the screen captures shown throughout this tutorial.
  • Samples are designed only to illustrate the VTune Amplifier features; they do not represent best practices for creating code.

Build the Target

Prerequisite: When using an offload or cross compiler, make sure to manually install binary utilities (Binutils) included into the Intel Xeon Phi installation zip file package. For installation instructions, please refer to the Intel C++ Compiler documentation.

Build the target on the host with full optimizations, which is recommended for performance analysis. For native application analysis used in this tutorial, you need to copy the binary to the Intel Xeon Phi coprocessor. For offload applications, no copying is required.

To communicate with the Intel Xeon Phi coprocessor cards, you may use any of the following mechanisms:

  • Mount an NFS share. See the NFS Mounting a Host Export topic in the Intel Manycore Platform Software Stack help for details.
  • Use existing SSH tools.

This sample uses the PuTTy* utility and assumes that ordinary user account access has not yet been established on the coprocessor (thus the references to "root"). See the Intel MPSS installation document for details on creating user accounts on the coprocessor.

  1. From the Start menu, select All Programs > Intel Parallel Studio XE version > Command Prompt > Parallel Studio XE with Intel Compiler XE and select a mode that corresponds to your system and version of the Microsoft* Visual Studio IDE installed, for example: Intel 64 Visual Studio 2013 mode.

    This command sets up the Intel C++ Compiler environment variables and opens a command prompt.

  2. Change the directory to the location where you extracted the sample code (for this example, assume that location is C:\samples\matrix\windows_mic).

  3. Build the sample code with the Intel C++ Compiler, copy the matrix.mic file to the card via the pscp command and set the execution permissions via plink.

    For automation, run the buildmatrix.bat script file including all these commands.

    icl /Qmic -lpthread -g -O3 -debug inline-debug-info -vec-report3 -o matrix.mic ..\src\matrix.mic  ..\src\matrix.c ..\src\multiply.c ..\src\util.c -D_LINUX
    pscp -l root -i <path_to_private_key_file>  C:\samples\matrix\windows_mic\matrix.mic <target_system>:/tmp
    plink -l root -i <path_to_private_key_file> root@<target_system>  chmod "+x" /tmp/matrix.mic


    Edit the buildmatrix.bat file to add a path to the private key file on your system.

    The matrix application is built as matrix.mic and copied to the /tmp directory on the selected card.

Create a Performance Baseline

  1. Run the application on the coprocessor using plink and record the results to establish a performance baseline:

    plink -l root -i <path_to_private_key_file> root@<target_system>  /tmp/matrix.mic

    In this tutorial's scenario, the command running the matrix application is added to the runmatrix.bat. Edit this bat file to add a path to the private key file on your system.


    Before you start the application, minimize the amount of other software running on your computer to get more accurate results.

  2. Note the execution time displayed at the bottom. For the matrix.mic executable in the figure above, the execution time is 24.988 seconds. Use this metric as a baseline against which you will compare subsequent runs of the application.


    Run the application several times, noting the execution time for each run, and use the average time. This helps to minimize skewed results due to transient system activity.

Key Terms

Para obtener información más completa sobre las optimizaciones del compilador, consulte nuestro Aviso de optimización.