Programming Guide


C/C++ OpenMP* and DPC++ Composability

The oneAPI programming model provides a unified compiler based on LLVM/Clang with support for OpenMP* offload. This allows seamless integration that allows the use of OpenMP constructs to either parallelize host side applications or offload to a target device. Both the
Intel® oneAPI
, available with the Intel® oneAPI Base Toolkit, and
Intel® C++ Compiler Classic
, available with the Intel® oneAPI HPC Toolkit or the Intel® oneAPI IoT Toolkit, support OpenMP and DPC++ composability with a set of restrictions. A single application can offload execution to available devices using OpenMP target regions or DPC++/SYCL constructs in different parts of the code, such as different functions or code segments.
OpenMP and DPC++ offloading constructs may be used in separate files, in the same file, or in the same function with some restrictions. OpenMP and DPC++ offloading code can be bundled together in executable files, in static libraries, in dynamic libraries, or in various combinations.
DPC++ is based on TBB runtime when executing device code on the CPU; hence, using both OpenMP and DPC++ on a CPU can lead to oversubscribing of threads. Performance analysis of workloads executing on the system could help determine if this is occurring.


There are some restrictions to be considered when mixing OpenMP and DPC++/SYCL constructs in the same application.
  • OpenMP directives cannot be used inside DPC++/SYCL kernels that run in the device. Similarly, DPC++/SYCL code cannot be used inside the OpenMP target regions. However, it is possible to use SYCL constructs within the OpenMP code that runs on the host CPU.
  • OpenMP and DPC++/SYCL device parts of the program cannot have cross dependencies. For example, a function defined in the SYCL part of the device code cannot be called from the OpenMP code that runs on the device and vice versa. OpenMP and SYCL device parts are linked independently and they form separate binaries that become a part of the resulting fat binary that is generated by the compiler.
  • The direct interaction between OpenMP and SYCL runtime libraries are not supported at this time. For example, a device memory object created by OpenMP API is not accessible by DPC++ code. That is, using the device memory object created by OpenMP in DPC++/SYCL code results unspecified execution behavior.


The following code snippet uses DPC++/SYCL and OpenMP offloading constructs in the same application.
#include <CL/sycl.hpp> #include <array> #include <iostream> float computePi(unsigned N) { float Pi; #pragma omp target map(from : Pi) #pragma omp parallel for reduction(+ : Pi) for (unsigned I = 0; I < N; ++I) { float T = (I + 0.5f) / N; Pi += 4.0f / (1.0 + T * T); } return Pi / N; } void iota(float *A, unsigned N) { cl::sycl::range<1> R(N); cl::sycl::buffer<float, 1> AB(A, R); cl::sycl::queue().submit([&](cl::sycl::handler &cgh) { auto AA = AB.template get_access<cl::sycl::access::mode::write>(cgh); cgh.parallel_for<class Iota>(R, [=](cl::sycl::id<1> I) { AA[I] = I; }); }); } int main() { std::array<float, 1024u> Vec; float Pi; #pragma omp parallel sections { #pragma omp section iota(, Vec.size()); #pragma omp section Pi = computePi(8192u); } std::cout << "Vec[512] = " << Vec[512] << std::endl; std::cout << "Pi = " << Pi << std::endl; return 0; }
The following command is used to compile the example code:
icpx -fsycl -fiopenmp -fopenmp-targets=spir64 offloadOmp_dpcpp.cpp
  • -fsycl
    option enables DPC++
  • -fiopenmp -fopenmp-targets=spir64
    option enables OpenMP* offload
The following shows the program output from the example code.
./a.out Vec[512] = 512 Pi = 3.14159
If the code does not contain OpenMP offload, but only normal OpenMP code, use the following command, which omits
icpx -fsycl -fiopenmp omp_dpcpp.cpp

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at