This is early documentation on using OpenMP* 4.5/5.0 TARGET feature to offload computation to Intel Integrated Graphics
Requirements for Offload to Intel GEN Graphics
At this time, support for OpenMP 4.5/5.0 TARGET offload feature requires the Intel® C++ Compiler next generation code generator. This C++ compiler is currently in BETA. To obtain this compiler, you will need to download and install the Intel® oneAPI Base Toolkit AND the Intel® oneAPI HPC Toolkit. GO HERE for these BETA packages. The Intel® C++ Compiler next generation code generator that supports OpenMP TARGET features is provided ONLY in the Intel® oneAPI HPC Toolkit.
The Intel® C/C++ Compiler in the Intel® Parallel Studio XE packages DO NOT SUPPORT OpenMP TARGET features. You will need the Intel® oneAPI Base Toolkit AND the Intel® oneAPI HPC Toolkit.
At this time, GPU devices supported for OpenMP TARGET are Intel Integrated Graphics ( Gen 9 through Gen 11 ). GO HERE for the Getting Started Guide for Linux.
Once the drivers are installed, use these options (described below):
-qnextgen -fiopenmp -fopenmp-targets=spir64
Fortran Status: 2021.beta03 and beyond in Q1 2020
The Intel® Fortran Compiler, Next Generation Code Generation, is in an early pre-Alpha phase. It is not ready for general use or testing at this time. At best, F77 and most F90 features may (or may not) work. No code features beyond the F90 Standard will work and WILL cause the compiler to crash and/or generate non-functional code. Only some simple cases of OpenMP 4.5 are working, mostly for F77 style arrays and arguments. To test your code to the F90 Standard and older can be done with:
ifort -qnextgen -what -stand f90 -warn errors <rest of options> <sources/objects>
The -qnextgen option selects the Intel® Fortran Compiler Next Generation Code Generator compiler. Without this option, the default, is to use the existing production Intel® Fortran Compiler, Classic compiler
The -stand f90 -warn errors option pair will flag all language features above Fortran 90, and print a warning that is treated like an error and prevent compilation of the offending source file.
We hope to have The Intel® Fortran Compiler, Next Generation Code Generation to begin general Beta testing towards the end of calendar year 2020. Please check the Fortran Release Notes for up-to-date information.
Fortran users should continue to use the Intel Fortran Compiler, Classic (without the -qnextgen option used to invoke the new NextGen compiler) in the Intel® Parallel Studio XE Editions OR the Intel® Fortran Compiler in the Intel® oneAPI HPC Toolkit until such time as the NextGen Fortran compiler is ready for test. At this time we are NOT taking issues against the NextGen Fortran compiler. We know it's unstable, that is expected at this pre-Alpha phase. We look forward to begin initial beta testing towards the end of calendar year 2020
Compiling for OpenMP 5.0 TARGET features
Use the following compiler options with the ICC/ICPC:
icc -qnextgen -fiopenmp -fopenmp-targets=spir64 [other options] myOmpProgram.c or icpc -qnextgen -fiopenmp -fopenmp-targets=spir64 [other options] myOmpProgram.cpp
Next, to get diagnostic messages:
then simply run your program.
Use option -fiopenmp to invoke OpenMP.
DO NOT USE legacy compatibility OpenMP options -qopenmp or -fopenmp. These will not use the newer compiler parsing of OpenMP pragmas and features ( like TARGET, MAP, from the offload features of OMP 4.5 and 5.0 ) and our new OpenMP Runtime that includes OpenMP 4.5/5.0 offload feature support . These older options are considered compatibility options for our Classic compiler. They are still used by ICC NextGen but will use the our older non-offload OpenMP and older classic OpenMP Runtime.
Debugging the Offload
To test if you successfully offload you should set environment variable LIBOMPTARGET_PROFILE:
This environment variable directs the OpenMP Runtime to output offload statistics for each offload after program termination. An example:
LIBOMPTARGET_PROFILE: -- DATA-READ: 45.905 msec -- DATA-WRITE: 0.020 msec -- EXEC-__omp_offloading_38_dc60d697_main_l33: 71.707 msec -- EXEC-__omp_offloading_38_dc60d697_main_l48: 15.776 msec
This output tells us the time spent (DATA-WRITE) sending data TO the GPU, and the time spent (DATA-READ) receiving data FROM the GPU. This example shows two offload TARGET regions (EXEC) - one in main() at line 133, another in main() at line 148 and the times spent in these kernels.
If you do NOT see this output then the OpenMP runtime had an issue finding your GPU driver device. To further debug, set environment variable LIBOMPTARGET_DEBUG=5 and re-run your application. You will need to examine your OpenCL driver installation for issues (beyond the scope of this simple document).
export LIBOMPTARGET_DEBUG=5 ./a.out
Targeting Device CPU or GPU
To insure that you are targeting GPU for offload, use the following env var ( available in oneAPI releases AFTER beta03 ):
Supported types are
Make sure to remove/unset any reference to OMP_DEFAULT_DEVICE=1 as that will break offload.
Another environment variable can set default device. CURRENTLY only 1 GPU device can be targeted, and it's device number is 0:
Your success is our success. Access these support resources when you need assistance.
Our OpenMP TARGET feature is in beta currently for ICC NextGen. We will add functionality and features as the beta period progresses. Check your Release Notes for the latest information on OpenMP feature support.