Programming Guide

Contents

C/C++ or Fortran with OpenMP* Offload Programming Model

The Intel® oneAPI DPC++/C++ Compiler and the Intel® Fortran Compiler (Beta) enable software developers to use OpenMP* directives to offload work to Intel accelerators to improve the performance of applications.
This section describes the use of OpenMP directives to target computations to the accelerator. Developers unfamiliar with OpenMP directives can find basic usage information documented in the OpenMP Support sections of the Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference or Intel® Fortran Compiler for oneAPI Developer Guide and Reference.
OpenMP is not supported for FPGA devices.

Basic OpenMP Target Construct

The OpenMP target construct is used to transfer control from the host to the target device. Variables are mapped between the host and the target device. The host thread waits until the offloaded computations are complete. Other OpenMP tasks may be used for asynchronous execution on the host; use the
nowait
clause to specify that the encountering thread does not wait for the target region to complete.
C/C++
The C++ code snippet below targets a SAXPY computation to the accelerator.
#pragma omp target map(tofrom:fa), map(to:fb,a) #pragma omp parallel for firstprivate(a) for(k=0; k<FLOPS_ARRAY_SIZE; k++) fa[k] = a * fa[k] + fb[k]
Array
fa
is mapped both to and from the accelerator since
fa
is both input to and output from the calculation. Array
fb
and the variable
a
are required as input to the calculation and are not modified, so there is no need to copy them out. The variable
FLOPS_ARRAY_SIZE
is implicitly mapped to the accelerator. The loop index
k
is implicitly private according to the OpenMP specification.
Fortran
This Fortran code snippet targets a matrix multiply to the accelerator.
!$omp target map(to: a, b ) map(tofrom: c ) !$omp parallel do private(j,i,k) do j=1,n do i=1,n do k=1,n c(i,j) = c(i,j) + a(i,k) * b(k,j) enddo enddo enddo !$omp end parallel do !$omp end target
Arrays
a
and
b
are mapped to the accelerator, while array
c
is both input to and output from the accelerator. The variable
n
is implicitly mapped to the accelerator. The private clause is optional since loop indices are automatically private according to the OpenMP specification.

Map Variables

To optimize data sharing between the host and the accelerator, the target data directive maps variables to the accelerator and the variables remain in the target data region for the extent of that region. This feature is useful when mapping variables across multiple target regions.
C/C++
#pragma omp target data [clause[[,] clause],…] structured-block
Fortran
!$omp target data [clause[[,] clause],…] structured-block !$omp end target data
Clauses
The clauses can be one or more of the following. See TARGET DATAfor more information.
  • DEVICE (integer-expression)
  • IF ([TARGET DATA:] scalar-logical-expression)
  • IS_DEVICE_PTR (list)
  • MAP ([[map-type-modifier[,]] map-type: ] list)
    Map type can be one or more of the following:
    • alloc
    • to
    • from
    • tofrom
    • delete
    • release
  • SUBDEVICE ([integer-constant ,] integer-expression [ : integer-expression [ : integer-expression]])
  • USE_DEVICE_ADDR (list) // available only in
    ifx
  • USE_DEVICE_PTR (ptr-list)
DEVICE (integer-expression) IF ([TARGET DATA:] scalar-logical-expression) IS_DEVICE_PTR (list) MAP ([[map-type-modifier[,]] map-type: alloc | to | from | tofrom | delete | release] list) SUBDEVICE ([integer-constant ,] integer-expression [ : integer-expression [ : integer-expression]]) USE_DEVICE_ADDR (list) // available only in ifx USE_DEVICE_PTR (ptr-list)
Use the target update directive to synchronize an original variable in the host with the corresponding variable in the device.

Compile to Use OMP TARGET

The following example commands illustrate how to compile an application using OpenMP target.
C/C++
  • Linux:
    icx -fiopenmp -fopenmp-targets=spir64 code.c
  • Windows (you can use
    icx
    or
    icpx
    ):
    icx /Qiopenmp /Qopenmp-targets=spir64 code.c
Fortran
  • Linux:
    ifx -fiopenmp -fopenmp-targets=spir64 code.f90
  • Windows:
    ifx /Qiopenmp /Qopenmp-targets=spir64 code.f90

Additional OpenMP Offload Resources

  • Intel offers code samples that demonstrate using OpenMP directives to target accelerators at https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming. Specific samples include:
    • Matrix Multiplication is a simple program that multiplies together two large matrices and verifies the results. This program is implemented using two ways: DPC++ or OpenMP.
    • The ISO3DFD sample refers to Three-Dimensional Finite-Difference Wave Propagation in Isotropic Media. The sample is a three-dimensional stencil used to simulate a wave propagating in a 3D isotropic medium. The sample shows some of the more common challenges and techniques when targeting OMP accelerator devices in more complex applications to achieve good performance.
    • openmp_reduction
      is a simple program that calculates pi. This program is implemented using C++ and OpenMP for CPUs and accelerators based on Intel® Architecture.
  • Get Started with OpenMP* Offload Feature provides details on using Intel's compilers with OpenMP offload, including lists of supported options and example code.
  • LLVM/OpenMP Runtimes describes the distinct types of runtimes available and can be helpful when debugging OpenMP offload.
  • openmp.org has an examples document: https://www.openmp.org/wp-content/uploads/openmp-examples-4.5.0.pdf. Chapter 4 of the examples document focuses on accelerator devices and the target construct.
  • Using OpenMP - the Next Step
    is a good OpenMP reference book. Chapter 6 covers OpenMP support for heterogeneous systems. For additional information on this book, see https://www.openmp.org/tech/using-openmp-next-step.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.