Intel® oneAPI DPC++/C++ Compiler Release Notes

Version:2021.2   Published:11/02/2020   Last Updated:05/03/2021

This document provides a summary of new and changed product features and includes notes about features and problems not described in the product documentation.

Where to Find the Release

Please follow the steps to download the toolkit from the Web Configurator, and follow the installation instructions to install.

2021.2.0 Release

  • Support for Alderlake and Sapphire rapids ISA. Following compiler options added:
    • -mavxvnni
    • -mcldemote
    • -mhreset
    • -mptwrite
    • -mserialize
    • -mwaitpkg
    • -march=alderlake, -xalderlake 
    • -march=sapphirerapids, ​-xsapphirerapids

  • CMake support for icx and icpx with compiler id INTELLLVM starting with CMake 3.20.0 version.

New Features in DPC++

  • Added support for SYCL2020 features device_has(), aspects, math array, and global work offset in kernel enqueue. A complete list of SYCL2020 features and DPC++ extensions supported can be found here.
  • Added support for DPC++ extension for pinned memory property
  • Added support for Experimental Explicit SIMD (ESIMD) extension with Level Zero runtime. Also added support on Windows host.
  • Partial support for #pragma vector aligned/unaligned
  • Auto mode for device code split feature which will now be the default mode.
  • Compiler IDE integration support for Microsoft* Visual Studio 16.9.
  • Fast math is enabled by default (i.e., -fp-model=fast), which means the compiler can make various out-of-box optimizations for floating-point math (float or double). With this optimization enabled, you might observe different bitwise results when compared to results from the oneAPI 2021.1 release or GCC.
  • Added support for Algorithmic C data types (ac_int, ac_fixed, ac_fixed_math, hls_float, hls_float_math, and ac_complex).
  • Added support for targeting multiple homogeneous FPGA devices with the same or different device codes.
  • Added support for viewing loop bottlenecks using the Bottlenecks viewer in the FPGA optimization report.
  • Added support for [[intel::scheduler_target_fmax_mhz(N)]] kernel attribute.
  • Added support for fp contract and fp reassociate pragmas to handle kernel’s arithmetic and floating-point operations at a finer granularity.

New features for OpenMP offload

  • Bug fixes and performance improvements this release. 

Changes to Existing Features

  • Intel compiler changed oneMKL implicit link option to -qmkl to avoid conflict with LLVM option -mkl (see LLVM documentation)
     

Bug Fixes

  • Fixed an issue with the FPGA-specific flag -reuse-exe=. It is now supported on both Windows and Linux systems. 
  • Fixed link warnings that were observed when compiling for FPGA and creating device code archive on Windows.
  • Fixed issues in the Bottleneck Viewer in the FPGA optimization report.
  • Fixed the aocl diagnose command error related to ICD diagnostics. 

Known Issues and Limitations

  • YUM/DNF/APT/ZYPPER packages oneAPI 2021.1 Gold (initial release) bug will prevent UPGRADEs. More details on this can be found here.
  • USM support for implicit migrations of shared-allocations between device and host is currently implemented in SW using access violation mechanisms (e.g. SIGSEV) to identify access from host. Undefined behavior may occur if applications rely on similar access-violation mechanisms, or they use system calls to access shared-memory allocations before being migrated to host by the GPU driver.
  • Specialization constants with a size less than 8 bytes are not supported on the level zero backend.
  • Invoking GPU offload code from a global object destructor in DPC++ leads to undefined behavior.
  • User-Defined Reduction(UDR) is not currently supported in SIMD and will be enabled in a future release.
  • #pragma float_control that occurs at file scope are not correctly effective for statement blocks that are nested within class definitions. The same issue exists for #pragma clang fp.
  • A DPC++ system that has FPGAs installed does not support multi-process execution. Creating a context opens the device associated with the context and places a lock on it for that process. No other process may use that device. Some queries about the device through device.get_info<>() also opens up the device and locks it to that process since the runtime needs to query the actual device to obtain that information. The following are examples of queries that lock the device: 
    • is_endian_little 
    • global_mem_size 
    • local_mem_size 
    • max_constant_buffer_size
    • max_mem_alloc_size
    • vendor
    • name
    • is_available
  • When compiling for FPGA, if you declare kernel names in an unnamed namespace, the kernel name does not display properly in FPGA optimization reports, such as Summary, FMAX II Report, Area Analysis of System, Graph Viewer, Kernel Memory Viewer, and Schedule Viewer. To work around this issue, declare kernel names globally. 
  • When debugging FPGA emulator code on Windows* in Microsoft Visual Studio*, the debugger does not stop at breakpoints set in kernel code. There is no workaround available for this issue currently. 
  • When compiling for FPGA and using a read-only accessor for a very wide struct, the compile-time to RTL (prior to the FPGA hardware image creation stage) can be large. As a workaround to address this long compile time, use a read-write accessor instead. 
  • When compiling for FPGA, if you declare very long kernel names, the compiler errors out. As a workaround, keep your kernel names shorter than 260 characters. 
  • On Windows, the FPGA emulator can silently fail by running out of memory. As a workaround, to catch this error, write your kernel code using the try-catch syntax. 
  • When compiling for FPGA, the compiler may sometimes ignore the ivdep attribute when it contains a pointer that is declared outside the scope where the attribute is applied. For example:
    int *p = ...
    // enter new scope
    {
      [[intel::ivdep(p)]] 
      for (int i = 0; i < N; i++) {
        // accesses to p
      }
    }

    In this example, the compiler might still find dependences on accesses to p in the loop despite the application of the ivdep attribute. To work around this limitation, declare the pointer (intended to be used in the ivdep attribute) within the same scope where the attribute is applied, if possible. This limitation does not result in functional errors on correctly written code but may affect performance on the generated hardware.

  • When compiling for FPGA, you cannot use a system installed with Intel® FPGA PAC D5005 to compile a SYCL application that targets Intel® PAC with Intel® Arria® 10 FX FPGA. Compilation may succeed but the compiled binary might fail at runtime. There is no workaround available for this issue currently.

  • The software stack for Intel® PAC with Intel Arria® 10 GX FPGA and that for Intel® FPGA PAC D5005 are not compatible with each other on the same machine. If you have installed one of them already on a system, you must first uninstall it by running the aocl uninstall command before installing the other.

  • Compiles for the FPGA emulator might fail if you have also installed a GPU platform (oneAPI-specific). You might see the Error: Compiler Error: OpenCL kernel compile/link FAILED error message. To work around this error and to achieve a successful FPGA emulator compilation, perform one of the following solutions:

    • Solution 1: Add the -Xsfast-emulator flag to every dpcpp command when targeting the emulation flow.

    • Solution 2: Execute one of the OS-specific command listed in the following:

      • On Ubuntu 18.04: export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6

      • On Ubuntu 20.04: export LD_PRELOAD=/lib/x86_64-linux-gnu/libstdc++.so.6

      • On SLES 15: export LD_PRELOAD=/usr/lib64/libstdc++.so.6

      • On CentOS 8.x/RHEL 8.1: export LD_PRELOAD=/lib64/libstdc++.so.6

​     NOTE: Intel® recommends Solution 2 when working with code samples as it is inconvenient to modify the dpcpp command in the code sample CMake file.

  • When you perform FPGA compile and link stages with a single dpcpp command (for example, dpcpp -fintelfpga -Xshardware src/kernel.cpp), if the source code is not located in the current directory, you might observe that the source code browser is missing in the generated FPGA optimization reports. To work around this issue, compile and link the executable in separate stages, as follows: 
    dpcpp -fintelfpga -Xshardware -c src/kernel.cpp -o kernel.o
    dpcpp -fintelfpga -Xshardware -kernel.o

  • When compiling for FPGA, the debug support on Windows is not available when using device-side libraries. To avoid this issue, do not run a debugger on the emulator platform on Windows.

  • In the FPGA optimization report, the Loop Viewer (Alpha) can only handle loops with 100 iterations or less currently. For designs with loops greater than 100 iterations, the optimization reports hang. There is no known workaround for this issue.

  • When compiling for FPGA or GPU, you might see the error clang: error while loading shared libraries: libtinfo.so.5: cannot open shared object file: No such file or directory. To work around this issue, you must install the required compatibility library by executing one of the following OS-specific commands:

    • On Ubuntu 20.04: sudo apt install -y libncurses5 libncurses5-dev libncursesw5-dev

    • On RHEL/CentOS 8: sudo yum install ncurses-compat-libs

    • On SUSE 15: sudo zypper install libcurses5 ncurses5-devel

  • ​When you perform FPGA compilation for the emulator platform on a Linux-based OS, you may encounter the following error:
    Error: Compiler Error: OpenCL kernel compile/link FAILED
    dpcpp: error: fpga compiler command failed with exit code 1 (use -v to see invocation)

    To work around this issue, you must either remove or rename the libstdc++.so.6 file by using one of the following commands:

    • To remove the file: 
      rm -f /opt/intel/oneapi/compiler/2021.2.0/linux/lib/oclfpga/linux64/lib/dspba/linux64/libstdc++.so.6

    • To rename the file: 
      mv /opt/intel/oneapi/compiler/2021.2.0/linux/lib/oclfpga/linux64/lib/dspba/linux64/libstdc++.so.6 /opt/intel/oneapi/compiler/2021.2.0/linux/lib/oclfpga/linux64/lib/dspba/linux64/libstdc++.so.6.bak

​NOTE: The libstdc++.so.6 file is not required by the Intel® oneAPI DPC++/C++ Compiler and it might be deleted even if you are not impacted by this error.

2021.1.2 Patch Release

  • This 2021.1.2 is a PATCH release.  It is not a full compiler and relies on updating an existing one.  It is intended to install over an existing oneAPI Base Toolkit 2021.1.1 installation.
  • This patch release fixes the known issue causing ICX OpenMP to offload to hang with the latest Level 0 driver. This patch is also recommended for DPCPP to work with the latest Level 0 driver. This patch is designed and tested to work with driver(s):
    • Windows GO HERE. Select either the WIn 10 DCH driver 27.20.100.9030 or the Xe MAX driver 27.20.100.9039 Please update to this driver if you plan to use the DPCPP or ICX OpenMP offload compilers in this patch.
    • Linux GO HERE This patch compiler is designed and tested to work with driver release 20201209.  Please update to this driver if you use DPCPP or ICX OpenMP offload.
  • When installing a patch release, install the latest patches for all the compilers that they use(Intel Fortran Compiler/Intel DPC++/C++ Compiler/Intel C++ Compiler classic).
  • Intel® CPU Runtime for OpenCL™ Applications is also required to be re-installed. You can download the Intel® CPU Runtime for OpenCL™ Applications for Windows from here. For Linux, the package is distributed through APT and YUM, please follow the instructions on Installing Intel® oneAPI Toolkits via Linux* Package Managers to set up the repository and install the package "intel-oneapi-runtime-opencl".

2021.1.1 Major Release

Key Features in DPC++

  • Compliance with DPC++ 1.0 specification
  • Support of Ahead-Of-Time (AOT) compilation.
  • Experimental Explicit SIMD programming support
  • To align with SYCL standard evolution, CL_SYCL_LANGUAGE_VERSION is replaced with SYCL_LANGUAGE_VERSION.
  • Integration with Visual Studio* 2017 & 2019, plus Eclipse* on Linux.
  • Applications using std::* math function in the kernel code to be compiled with the option -fsycl-device-lib= that accepts arguments libc, libm-fp32, libm-fp64, all
  • Detailed information and available intrinsics can be found in the Interactive Intrinsics Guide.
  • Added support for installing the Intel® FPGA Add-On for oneAPI Base Toolkit via Linux package managers (YUM, APT, and Zypper).
  • Added support for targeting multiple FPGA platforms.

Key Features in OpenMP offload

  • OpenMP 4.5 and OpenMP 5.1 subset support 
  • OpenMP offloading support for multiple GPUs 
  • OpenMP Offloading opt-report
  • OpenMP and DPC++ composability 
  • Support for Intel USM allocation API extensions 
  • Support for Intel extensions of invoking MKL for GPU execution 
  • Inline v-ISA support in OpenMP Offloading Region 

Known Issues and Limitations

  • OpenMP offload may not work on level0 with the initial release of oneAPI and certain drivers. The behavior you may see when reaching a TARGET directive is that the application may hang.  Use Ctrl+C to abort. To work around this issue, please use the OpenCL driver for offload using this environment variable:
    export LIBOMPTARGET_PLUGIN=OPENCL
    The issue has been fixed in the oneAPI DPC++/C++ Compiler 2021.1.2 Patch Release.
  • credist.txt file for the DPC++/C++ compiler is available online only for gold release and will be part of compiler packages in a future release.
  • When subgroup algorithms are used in a loop with a conditional statement ("if", for example), the results on CPU may be incorrect.
  • Using scalbn() in OpenMP target code is causing a runtime failure. The workaround is to replace scalbn() with ldexp(). The problem will be fixed in a future release.
  • icx compiler does not support linking library archives using the -l option for libraries that contain target offload code. More details and workaround for this issue can be found at Known Issue: Static Libraries and Target Offload.
  • Attempt to use Link Time Optimization (LTO) is causing a linker failure. To successfully link, make sure you have the recommended versions of binutils for your OS listed at Intel® oneAPI DPC++/C++ Compiler and Intel® oneAPI DPC++ Library System Requirements
  • User-defined functions with the same name and signature (exact match of arguments, return type does not matter) as of an OpenCL C built-in function, can lead to Undefined Behavior. More details about this issue can be found at Known Issue: User-defined Functions with the Same Signature as OpenCL C built-in functions.
  • DPC++ runtime library follows the Semantic Versioning scheme: MAJOR.MINOR.PATCH.  MAJOR version indicates breaking change (Version X is backward incompatible with version X-1). MINOR indicates a non-breaking change. The workaround is to rebuild the application.
  • The format of the object files produced by the compiler can change between versions. The workaround is to rebuild the application.
  • Using cl::sycl::program API to refer to a kernel defined in another translation unit leads to undefined behavior.
  • Employing a read sampler for the image accessor may result in sporadic issues with the Level Zero plugin/backend.
  • Printing internal defines is not supported on Windows.
  • Group algorithms for MUL/AND/OR/XOR cannot be enabled for group scope due to SPIR-V limitations, and are not enabled for sub-group scope yet as the SPIR-V version is not automatically raised from 1.1 to 1.3
  • Dead Argument Elimination for ESIMD cannot be run since the pointers to SPIR kernel functions are saved in !genx.kernels metadata.
  • Devices returned by passing the same filters to the filter_selector may not compare equal.
  • On Windows, DPC++ compiler enforces using dynamic C++ runtime for application linked with SYCL library by:
    • linking with msvcrt[d].dll when -fsycl switch is used.
    • emitting an error on attempts to compile a program with static C++ RT using -fsycl and /MT or /MTd.
      That protects you from complicated runtime errors caused by C++ objects crossing sycl[d].dll boundary and not always handled properly by different versions of C++ RT used on app and sycl[d].dll sides.
  • Runtime exception like the following on Windows when the application is compiled in Debug mode. The workaround is to compile with /Od on the command line or add /Od to Linker > General > Pass additional options to device compilers in the IDE.
    Unhandled exception at 0x00007FF930ED7247 (igc64.dll) in gamma-correction.exe: 0xC0000005: Access violation reading location 0x0000027455B6C000.
  • Read the whitepaper on Challenges, tips, and known issues when debugging heterogenous programs using DPC++ or OpenMP offload.
  • A DPC++ system that has FPGAs installed does not support multi-process execution. Creating a context opens the device associated with the context and places a lock on it for that process. No other process may use that device. Some queries about the device through device.get_info<>() also opens up the device and locks it to that process since the runtime needs to query the actual device to obtain that information. The following are examples of queries that lock the device: 
    • is_endian_little 
    • global_mem_size 
    • local_mem_size 
    • max_constant_buffer_size
    • max_mem_alloc_size
    • vendor
    • name
    • is_available
  • When compiling for FPGA, if you declare kernel names in an unnamed namespace, the kernel name does not display properly in FPGA optimization reports, such as Summary, FMAX II Report, Area Analysis of System, Graph Viewer, Kernel Memory Viewer, and Schedule Viewer. To work around this issue, declare kernel names globally. 
  • When debugging FPGA emulator code on Windows* in Microsoft Visual Studio*, the debugger does not stop at breakpoints set in kernel code. There is no workaround available for this issue currently. 
  • The FPGA command aocl diagnose might report the ICD diagnostics FAILED error. You can safely ignore this error because if you installed the Intel® oneAPI Base Toolkit as directed, it means that the ICD is also installed correctly. If you have installed the Intel® FPGA Add-on for oneAPI Base Toolkit package successfully, you should not observe any compile or FPGA hardware run failures due to this error.
  • The Bottleneck Viewer in the FPGA optimization report appears blank without any data reported. To work around this issue, refer to the Loop Analysis report Details pane to identify bottlenecks.
  • When compiling for FPGA, if you declare very long kernel names, the compiler errors out. As a workaround, keep your kernel names shorter than 260 characters. 
  • When compiling for FPGA and creating device code archive on Windows, you might see the following link warnings:
    warning LNK4078: multiple '__CLANG_OFFLOAD_BUNDLE_SIZE__syc' sections found with different attributes (40100800)
    warning LNK4078: multiple '__CLANG_OFFLOAD_BUNDLE__sycl-fpg' sections found with different attributes (40100800)
    You can safely ignore these warnings.
  • The FPGA-specific flag -reuse-exe= is not supported on Windows. Refer to the fast_recompile FPGA tutorial for an example on how to separate host and device code to minimize compile time when you change only the host code. 

  • When compiling for FPGA and using a read-only accessor for a very wide struct, the compile-time to RTL (prior to the FPGA hardware image creation stage) can be large. As a workaround to address this long compile time, use a read-write accessor instead. 

  • When running a design compiled for the FPGA emulator, you might encounter the OpenCL API failed error message. OpenCL API returns -5 (CL_OUT_OF_RESOURCES) error. To work around this issue, increase the amount of memory to the value the emulator runtime is permitted to allocate (the default value is 512 KB) using the following commands, where is an integer followed by KB for kilobytes (for example, 1024 KB) or MB for megabytes (for example, 32 MB): 

    • On Linux: export CL_CONFIG_CPU_FORCE_PRIVATE_MEM_SIZE=

    • On Windows: set CL_CONFIG_CPU_FORCE_PRIVATE_MEM_SIZE=

  • When the FPGA emulator runs out of memory on Windows (as described in the previous issue), the OpenCL API failed error message might not get generated sometimes, that is, the emulator run can silently fail. As a workaround, to ensure that this out-of-memory error is caught, write your kernel code using the try-catch syntax.

  • When compiling for FPGA, the compiler may sometimes ignore the ivdep attribute when it contains a pointer that is declared outside the scope where the attribute is applied. For example:
    int *p = ...
    // enter new scope
    {
      [[intel::ivdep(p)]] 
      for (int i = 0; i < N; i++) {
        // accesses to p
      }
    }

    In this example, the compiler might still find dependences on accesses to p in the loop despite the application of the ivdep attribute. To work around this limitation, declare the pointer (intended to be used in the ivdep attribute) within the same scope where the attribute is applied, if possible. This limitation does not result in functional errors on correctly written code but may affect performance on the generated hardware. 

  • When compiling for FPGA, you cannot use a system installed with Intel® FPGA PAC D5005 to compile a SYCL application that targets Intel® PAC with Intel® Arria® 10 FX FPGA. Compilation may succeed but the compiled binary might fail at runtime. There is no workaround available for this issue currently.

  • The software stack for Intel® PAC with Intel Arria® 10 GX FPGA and that for Intel® FPGA PAC D5005 are not compatible with each other on the same machine. If you have installed one of them already on a system, you must first uninstall it by running the aocl uninstall command before installing the other.

  • Compiles for the FPGA emulator might fail if you have also installed a GPU platform (oneAPI-specific). You might see the Error: Compiler Error: OpenCL kernel compile/link FAILED error message. To work around this error and to achieve a successful FPGA emulator compilation, perform one of the following solutions:

    • Solution 1: Add the -Xsfast-emulator flag to every dpcpp command when targeting the emulation flow.

    • Solution 2: Execute one of the OS-specific command listed in the following:

      • On Ubuntu 18.04: export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6

      • On Ubuntu 20.04: export LD_PRELOAD=/lib/x86_64-linux-gnu/libstdc++.so.6

      • On SLES 15: export LD_PRELOAD=/usr/lib64/libstdc++.so.6

      • On CentOS 8.x/RHEL 8.1: export LD_PRELOAD=/lib64/libstdc++.so.6

​     NOTE: Intel® recommends Solution 2 when working with code samples as it is inconvenient to modify the dpcpp command in the code sample CMake file.

Support Deprecated

-mkl compiler option replaced with -qmkl

The compiler option on Linux -mkl is deprecated and may be removed in a future release. In a future release, the replacement will be -qmkl. This compiler option tells the compiler to link to certain libraries in the Intel® oneAPI Math Kernel Library.

Additional Documentation

Notices and Disclaimers

Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from a course of performance, course of dealing, or usage in trade.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.