ITT API Collection on the Intel® Xeon Phi™ Coprocessor

Note

Data collection on Intel® Xeon Phi™ coprocessor requires Intel VTune™ Amplifier XE.

To profile an application that runs on the Intel® Xeon Phi™ coprocessor in native mode, start the Intel VTune Amplifier analysis from your host system through a secure shell (SSH) connection to the coprocessor. To profile an Intel Xeon Phi coprocessor application that does not run in native mode, start the application from VTune Amplifier by launching it on the host directly. In both cases VTune Amplifier collects performance data on the target Intel Xeon Phi coprocessor, but you control the collection from the host. You can enhance this performance analysis by enabling ITT API data collection. For example, consider instrumenting your application code with Task API calls or collecting OpenMP* frames. This requires exporting additional environment variables with the ssh command line, or with a script that launches your application on the Intel Xeon Phi coprocessor.

The exact settings for the application and the environment depend on the application type:

  • Native Intel Xeon Phi coprocessor applications run directly on the coprocessor but are launched from the host.
  • Offloaded applications run on the host but use the compiler offload feature to submit work to the Intel Xeon Phi coprocessor.

Profiling a Native Intel Xeon Phi Coprocessor Application

Normally, a native Intel Xeon Phi coprocessor application is launched as:

[host]$ ssh <mic target> myApp

where <mic target> represents the alias name of Intel Xeon Phi coprocessor card, or the card IP address. For example, to launch the Hotspots analysis of the myApp application on the card with alias mic0 via SSH, enter the following command line:

[host]$ amplxe-cl -c knc-hotspots -- ssh mic0 /home/user/myApp

where /home/user/myApp is the path to the application on the media mounted to the Intel Xeon Phi coprocessor file system.

To enable the ITT API collection, export the following environment variables, using one of the following options:

  • If you choose the default installation flow with the libittnotify library installed to the coprocessor (/usr/lib64/libittnotify.so exists on your card) set the following environment variable for the application passing the following line, via your ssh command, or via your launch script, to the card:

    KMP_FOR_TPROFILE=1

    [host]$ amplxe-cl -c knc-hotspots -- ssh mic0 KMP_FOR_TPROFILE=1 /home/user/myApp

  • If you use the VTune Amplifier XE 2013 Update 15 or older or if you set up the coprocessor card for VTune Amplifier analysis manually and libittnotify.so is not available in standard search locations for .so loading, like /usr/lib64 on the coprocessor card, set the following environment variables:
    INTEL_LIBITTNOTIFY64=$MIC_INTEL_LIBITTNOTIFY64
    INTEL_JIT_PROFILER64=$MIC_INTEL_JIT_PROFILER64
    INTEL_ITTNOTIFY_CONFIG=$MIC_INTEL_ITTNOTIFY_CONFIG
    
    In this case the environment variables must be exported in a single script with the ssh command launching the application. For example, to profile with the Task API command, enter:
    [host]$ amplxe-cl -c knc-hotspots -knob enable-user-tasks=true -- /home/user/run.sh
    
    where run.sh is a script launched by VTune Amplifier on the host. The script contains the following ssh command launching the application:
    [host]$ cat /home/user/run.sh 
    #!/bin/sh
    ssh mic0 INTEL_LIBITTNOTIFY64=$MIC_INTEL_LIBITTNOTIFY64 \
    INTEL_JIT_PROFILER64=$MIC_INTEL_JIT_PROFILER64 \
    INTEL_ITTNOTIFY_CONFIG=$MIC_INTEL_ITTNOTIFY_CONFIG \
    /home/user/myApp
    

To enable OpenMP frame analysis with a compiler version earlier than Intel® Compiler 14.0 (a component of the Intel Composer XE 2013 SP1), export the following environment variable from the host:

KMP_FORKJOIN_FRAMES=1

Profiling an Offloaded Application

To launch the Hotspots analysis for an offloaded application on the Intel Xeon Phi coprocessor, enter:

 [host]$ amplxe-cl -c knc-hotspots -knob target-cards mic0 -- /home/user/myOffloadApp

where -knob target-cards mic0 is the option specifying the mic0 card to compute the offloaded part of the application myOffloadApp.

If you use the VTune Amplifier XE 2013 Update 15 or older to enable performance analysis for offload applications, set the environment variable AMPLXE_COI_DEBUG_SUPPORT=TRUE. By default, it is set as FALSE to reduce the overhead of running offload applications.

To enable the ITT API collection, export the following environment variables from the host:

[host]$ export MIC_ENV_PREFIX=MIC

This causes other environment variables to be propagated as well.

To enable OpenMP frame analysis with a compiler version earlier than Intel® Compiler 14.0 (a component of the Intel Composer XE 2013 SP1), you need to export an additional environment variable:

[host]$ export MIC_KMP_FORKJOIN_FRAMES=1

Note

ITT API collection on the Intel Xeon Phi coprocessor uses a temporary directory on the card. By default, /tmp is used. To specify a different directory, set MIC_ENV_PREFIX to MIC and MIC_TMPDIR to the temporary directory of your choice. This variable should be visible to the amplxe-cl or amplxe-gui process that launches the collection.

For more complete information about compiler optimizations, see our Optimization Notice.