Get Software Tools
You need the following tools to try these tutorial steps yourself using the
matrix sample application:
Intel® VTune™ Amplifier, including sample applications
sampling driver, set up during the VTune Amplifier installation
If, for some reason, the VTune Amplifier was not able to install the driver, you will not be able to run the analysis and will see a warning message. See online help for additional instructions how to install the driver manually.
Intel Manycore Platform Software Stack. See Release Notes for more information.
zipfile extraction utility
Intel C++ compiler installed on the host. See Release Notes for more information.
Acquire Intel VTune Amplifier
If you do not already have access to the VTune Amplifier, you can download an evaluation copy from http://software.intel.com/en-us/articles/intel-software-evaluation-center/.
Install and Set Up VTune Amplifier Sample Applications
- Copy the
matrix_vtune_amp_xe.zipfile from the
<install-dir>\samples\<locale>\C++\directory to a writable directory or share on your system. The default installation path is
C:\Program Files(x86)\IntelSWTools\VTune Amplifier XE 2016\(on certain systems, instead of
Program Files(x86)the directory name is
Extract the sample from the
Build the Target
Prerequisite: When using an offload or cross compiler, make sure to manually install binary utilities (Binutils) included into the Intel Xeon Phi installation zip file package. For installation instructions, please refer to the Intel Compiler documentation.
Build the target on the host with full optimizations, which is recommended for performance analysis. For native application analysis used in this tutorial, you need to copy the binary to the Intel Xeon Phi coprocessor. For offload applications, no copying is required.
To communicate with the Intel Xeon Phi coprocessor cards, you may use any of the following mechanisms:
- Mount an NFS share. See the NFS Mounting a Host Export topic in the Intel Manycore Platform Software Stack (MPSS) help for details.
- Use existing SSH tools.
This sample uses the PuTTy* utility and assumes that ordinary user account access has not yet been established on the coprocessor (thus the references to "root"). See the Intel MPSS installation document for details on creating user accounts on the coprocessor.
From the Start menu, select All Programs > Intel Parallel Studio XE 2015 > Command Prompt > Parallel Studio XE with Intel Compiler XE and select a mode that corresponds to your system and version of the Microsoft* Visual Studio IDE installed, for example: Intel 64 Visual Studio 2012 mode.
This command sets up the Intel compiler environment variables and opens a command prompt.
Change the directory to the location where you extracted the sample code (for this example, assume that location is C:\samples\matrix\windows_mic).
Build the sample code with the Intel C++ compiler, copy the
matrix.micfile to the card via the
pscpcommand and set the execution permissions via
For automation, run the
buildmatrix.batscript file including all these commands.
icl /Qmic -lpthread -g -O3 -debug inline-debug-info -vec-report3 -o matrix.mic ..\src\matrix.mic ..\src\matrix.c ..\src\multiply.c ..\src\util.c -D_LINUX
pscp -l root -i <path_to_private_key_file> C:\samples\matrix\windows_mic\matrix.mic <target_system>:/tmp
plink -l root -i <path_to_private_key_file> root@<target_system> chmod "+x" /tmp/matrix.mic
buildmatrix.batfile to add a path to the private key file on your system.
matrixapplication is built as
matrix.micand copied to the /tmp directory on the selected card.
Create a Performance Baseline
Run the application on the coprocessor using
plinkand record the results to establish a performance baseline:
plink -l root -i <path_to_private_key_file> root@<target_system> /tmp/matrix.mic
In this tutorial's scenario, the command running the
matrixapplication is added to the
runmatrix.bat. Edit this
batfile to add a path to the private key file on your system.
Before you start the application, minimize the amount of other software running on your computer to get more accurate results.
Note the execution time displayed at the bottom. For the
matrix.micexecutable in the figure above, the execution time is 24.988 seconds. Use this metric as a baseline against which you will compare subsequent runs of the application.
Run the application several times, noting the execution time for each run, and use the average time. This helps to minimize skewed results due to transient system activity.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804