User Guide

Contents

Analyze DPC++, OpenCL™, and OpenMP* Target Applications with Offload Advisor

If your application is written in DPC++, OpenCL™, or OpenMP* with
pragma omp target
(for C++) or
directive omp target
(for Fortran), and the application contains code that is already offloaded to a target device, you may want to analyze it with the Offload Advisor and model potential performance gains from offloading to a different target device. For example, you have an application with code offloaded to an integrated GPU, but wish to model performance on a discrete GPU.
To do this, make use of
CPU offload profiling
. In this approach, code is temporarily offloaded to the CPU to project application performance on different hardware. You can use this approach to profile your code with Advisor, or make performance projections with Offload Advisor.
Enabling GPU profiling is not required when using the CPU offload feature
.

For OpenMP*

  1. Disable offloading or set the offload target to the CPU. To do this, use environment variables:
    • Set the control default device using
      OMP_DEFAULT_DEVICE
      , which determines the device number used in device constructs. Non-negative integer values are accepted.
    • Set the execution mode using the following environment variables:
      • OMP_TARGET_OFFLOAD=MANDATORY
        and
        LIBOMPTARGET_DEVICETYPE=CPU
        enable offloading the target region code to run on a CPU.
      • OMP_TARGET_OFFLOAD=DISABLED
        disables code offloading; the code runs natively on the CPU.
    For example, you can ensure that kernels are ‘offloaded’ to the CPU using the commands:
    export OMP_TARGET_OFFLOAD=DISABLED export LIBOMPTARGET_DEVICETYPE=CPU
    By default, the environment variables are set to
    OMP_TARGET_OFFLOAD=MANDATORY
    and
    LIBOMPTARGET_DEVICETYPE=GPU
    for OpenMP target applications to be offloaded to a GPU.
  2. Set the
    INTEL_JIT_BACKWARD_COMPATIBILITY
    environment variable to
    1
    .
  3. When collecting performance metrics and modeling performance with Offload Advisor, use the
    --jit
    option.
    The
    --jit
    option automatically enables
    --assume-hide-taxes
    for performance modeling, which hides all invocation taxes except the first one in the report. For details, see Manage Invocation Taxes.
You can then execute the application as usual.
For sample commands to run performance modeling, see Collect Performance Metrics.
For additional information on environment variables, see Get Started Using the OpenMP* Offload to GPU Feature.

For DPC++

  1. Specify the CPU as the target device in the application source code by using
    sycl::host_selector
    .
  2. Set the
    INTEL_JIT_BACKWARD_COMPATIBILITY
    environment variable to
    1
    .
  3. When collecting performance metrics and modeling performance with Offload Advisor, use the
    --jit
    option.
    The
    --jit
    option automatically enables
    --assume-hide-taxes
    for performance modeling, which hides all invocation taxes except the first one in the report. For details, see Manage Invocation Taxes.
You can then execute the application as usual.
For sample commands to run performance modeling, see Collect Performance Metrics.
For additional information about SYCL, see https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf.

For OpenCL™

  1. Set the
    INTEL_JIT_BACKWARD_COMPATIBILITY
    environment variable to
    1
    .
  2. Configure your OpenCL code to be offloaded to a CPU. Refer to the OpenCL documentation at https://www.khronos.org/registry/OpenCL/ for specific instructions.
  3. When collecting performance metrics and modeling performance with Offload Advisor, use the
    --jit
    option.
    The
    --jit
    option automatically enables
    --assume-hide-taxes
    for performance modeling, which hides all invocation taxes except the first one in the report. For details, see Manage Invocation Taxes.
You can then execute the application as usual.
For sample commands to run performance modeling, see Collect Performance Metrics.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804