Data collection on Intel® Xeon Phi™ coprocessor requires Intel VTune™ Amplifier XE.
To profile an application that runs on the Intel® Xeon Phi™ coprocessor in native mode, start the Intel VTune Amplifier analysis from your host system through a secure shell (SSH) connection to the coprocessor. To profile an Intel Xeon Phi coprocessor application that does not run in native mode, start the application from VTune Amplifier by launching it on the host directly. In both cases VTune Amplifier collects performance data on the target Intel Xeon Phi coprocessor, but you control the collection from the host. You can enhance this performance analysis by enabling ITT API data collection. For example, consider instrumenting your application code with Task API calls or collecting OpenMP* frames. This requires exporting additional environment variables with the
ssh command line, or with a script that launches your application on the Intel Xeon Phi coprocessor.
The exact settings for the application and the environment depend on the application type:
- Native Intel Xeon Phi coprocessor applications run directly on the coprocessor but are launched from the host.
- Offloaded applications run on the host but use the compiler offload feature to submit work to the Intel Xeon Phi coprocessor.
Profiling a Native Intel Xeon Phi Coprocessor Application
Normally, a native Intel Xeon Phi coprocessor application is launched as:
[host]$ ssh <mic target> myApp
<mic target> represents the alias name of Intel Xeon Phi coprocessor card, or the card IP address. For example, to launch the Hotspots analysis of the
myApp application on the card with alias
mic0 via SSH, enter the following command line:
[host]$ amplxe-cl -c knc-hotspots -- ssh mic0 /home/user/myApp
/home/user/myApp is the path to the application on the media mounted to the Intel Xeon Phi coprocessor file system.
To enable the ITT API collection, export the following environment variables, using one of the following options:
If you choose the default installation flow with the
libittnotifylibrary installed to the coprocessor (/usr/lib64/libittnotify.so exists on your card) set the following environment variable for the application passing the following line, via your
sshcommand, or via your launch script, to the card:
[host]$ amplxe-cl -c knc-hotspots -- ssh mic0 KMP_FOR_TPROFILE=1 /home/user/myApp
- If you use the VTune Amplifier XE 2013 Update 15 or older or if you set up the coprocessor card for VTune Amplifier analysis manually and libittnotify.so is not available in standard search locations for .so loading, like /usr/lib64 on the coprocessor card, set the following environment variables:
INTEL_LIBITTNOTIFY64=$MIC_INTEL_LIBITTNOTIFY64 INTEL_JIT_PROFILER64=$MIC_INTEL_JIT_PROFILER64 INTEL_ITTNOTIFY_CONFIG=$MIC_INTEL_ITTNOTIFY_CONFIGIn this case the environment variables must be exported in a single script with the
sshcommand launching the application. For example, to profile with the Task API command, enter:
[host]$ amplxe-cl -c knc-hotspots -knob enable-user-tasks=true -- /home/user/run.shwhere
run.shis a script launched by VTune Amplifier on the host. The script contains the following
sshcommand launching the application:
[host]$ cat /home/user/run.sh #!/bin/sh ssh mic0 INTEL_LIBITTNOTIFY64=$MIC_INTEL_LIBITTNOTIFY64 \ INTEL_JIT_PROFILER64=$MIC_INTEL_JIT_PROFILER64 \ INTEL_ITTNOTIFY_CONFIG=$MIC_INTEL_ITTNOTIFY_CONFIG \ /home/user/myApp
To enable OpenMP frame analysis with a compiler version earlier than Intel® Compiler 14.0 (a component of the Intel Composer XE 2013 SP1), export the following environment variable from the host:
Profiling an Offloaded Application
To launch the Hotspots analysis for an offloaded application on the Intel Xeon Phi coprocessor, enter:
[host]$ amplxe-cl -c knc-hotspots -knob target-cards mic0 -- /home/user/myOffloadApp
-knob target-cards mic0 is the option specifying the
mic0 card to compute the offloaded part of the application
If you use the VTune Amplifier XE 2013 Update 15 or older to enable performance analysis for offload applications, set the environment variable
AMPLXE_COI_DEBUG_SUPPORT=TRUE. By default, it is set as
FALSE to reduce the overhead of running offload applications.
To enable the ITT API collection, export the following environment variables from the host:
[host]$ export MIC_ENV_PREFIX=MIC
This causes other environment variables to be propagated as well.
To enable OpenMP frame analysis with a compiler version earlier than Intel® Compiler 14.0 (a component of the Intel Composer XE 2013 SP1), you need to export an additional environment variable:
[host]$ export MIC_KMP_FORKJOIN_FRAMES=1
ITT API collection on the Intel Xeon Phi coprocessor uses a temporary directory on the card. By default,
/tmp is used. To specify a different directory, set
MIC_TMPDIR to the temporary directory of your choice. This variable should be visible to the
amplxe-gui process that launches the collection.