Intel® oneAPI DPC++/C++ Compiler Release Notes

Version:2021.1   Last Updated:01/12/2021

This document provides a summary of new and changed product features and includes notes about features and problems not described in the product documentation.

Where to Find the Release

Please follow the steps to download the toolkit from the Web Configurator, and follow the installation instructions to install.

oneAPI DPC++/C++ Compiler 2021.1.2 Patch Release

  • This 2021.1.2 is a PATCH release.  It is not a full compiler and relies on updating an existing one.  It is intended to install over an existing oneAPI Base Toolkit 2021.1.1 installation.
  • This patch release fixes the known issue causing ICX OpenMP offload to hang with the latest Level 0 driver. This patch is also recommended for DPCPP to work with the latest Level 0 driver. This patch is designed and tested to work with driver(s):
    • Windows GO HERE. Select either the WIn 10 DCH driver 27.20.100.9030 or the Xe MAX driver 27.20.100.9039 Please update to this driver if you plan to use the DPCPP or ICX OpenMP offload compilers in this patch.
    • Linux GO HERE This patch compiler is designed and tested to work with driver release 20201209.  Please update to this driver if you use DPCPP or ICX OpenMP offload.
  • When installing a patch release, users to install the latest patches for all the compilers that they use(Intel Fortran Compiler/Intel DPC++/C++ Compiler/Intel C++ Compiler classic).
  • Intel® CPU Runtime for OpenCL™ Applications is also required to be re-installed. You can download the Intel® CPU Runtime for OpenCL™ Applications for Windows from here. For Linux, the package is distributed through APT and YUM, please follow the instructions on Installing Intel® oneAPI Toolkits via Linux* Package Managers to setup the repository and install the package "intel-oneapi-runtime-opencl".

oneAPI DPC++/C++ Compiler 2021.1.1 Major Release

Key Features in DPC++

  • Compliance with DPC++ 1.0 specification
  • Support of Ahead-Of-Time (AOT) compilation.
  • Experimental Explicit SIMD programming support
  • To align with SYCL standard evolution, CL_SYCL_LANGUAGE_VERSION is replaced with SYCL_LANGUAGE_VERSION.
  • Integration with Visual Studio* 2017 & 2019, plus Eclipse* on Linux 
  • Applications using std::* math function in the kernel code to be compiled with the option -fsycl-device-lib=<value> which accepts arguments libc, libm-fp32, libm-fp64, all
  • Detailed information and available intrinsics can be found in the Interactive Intrinsics Guide.
  • Added support for installing the Intel® FPGA Add-On for oneAPI Base Toolkit via Linux package managers (YUM, APT, and Zypper).
  • Added support for targeting multiple FPGA platforms.

Key Features in OpenMP offload

  • OpenMP 4.5 and OpenMP 5.1 subset support 
  • OpenMP offloading support for multiple GPUs 
  • OpenMP Offloading opt-report
  • OpenMP and DPC++ composability 
  • Support for Intel USM allocation API extensions 
  • Support for Intel extensions of invoking MKL for GPU execution 
  • Inline v-ISA support in OpenMP Offloading Region 

Known Issues and Limitations

  • OpenMP offload may not work on level0 with the initial release of oneAPI and certain drivers.  Behavior you may see when reaching a TARGET directive is that the application may hang.  Use Ctrl+C to abort.  To work around this issue, please use the OpenCL driver for offload using this environment variable:
    export LIBOMPTARGET_PLUGIN=OPENCL
    The issue has been fixed in the oneAPI DPC++/C++ Compiler 2021.1.2 Patch Release.
  • credist.txt file for the DPC++/C++ compiler is available online only for gold release and will be part of compiler packages in a future release.
  • When subgroup algorithms are used in a loop with a conditional statement ("if", for example), the results on CPU may be incorrect.
  • Using scalbn() in OpenMP target code is causing a runtime failure. The workaround is to replace scalbn() with ldexp(). The problem will be fixed in a future release.
  • icx compiler does not support linking library archives using the -l option for libraries that contain target offload code. More details and workaround for this issue can be found at Known Issue: Static Libraries and Target Offload.
  • Attempt to use Link Time Optimization (LTO) is causing a linker failure. To successfully link, make sure you have the recommended versions of binutils for your OS listed at Intel® oneAPI DPC++/C++ Compiler and Intel® oneAPI DPC++ Library System Requirements
  • User-defined functions with the same name and signature (exact match of arguments, return type does not matter) as of an OpenCL C built-in function, can lead to Undefined Behavior. More details about this issue can be found at Known Issue: User-defined Functions with the Same Signature as OpenCL C built-in functions.
  • DPC++ runtime library follows the Semantic Versioning scheme: MAJOR.MINOR.PATCH.  MAJOR version indicates breaking change (Version X is backward incompatible with version X-1). MINOR indicates a non-breaking change. The workaround is to rebuild the application.
  • The format of the object files produced by the compiler can change between versions. The workaround is to rebuild the application.
  • Using cl::sycl::program API to refer to a kernel defined in another translation unit leads to undefined behavior.
  • Employing a read sampler for the image accessor may result in sporadic issues with the Level Zero plugin/backend.
  • Printing internal defines is not supported on Windows.
  • Group algorithms for MUL/AND/OR/XOR cannot be enabled for group scope due to SPIR-V limitations, and are not enabled for sub-group scope yet as the SPIR-V version is not automatically raised from 1.1 to 1.3
  • Dead Argument Elimination for ESIMD cannot be run since the pointers to SPIR kernel functions are saved in !genx.kernels metadata.
  • Devices returned by passing the same filters to the filter_selector may not compare equal.
  • On Windows, DPC++ compiler enforces using dynamic C++ runtime for application linked with SYCL library by:
    • linking with msvcrt[d].dll when -fsycl switch is used.
    • emitting an error on attempts to compile a program with static C++ RT using -fsycl and /MT or /MTd.
      That protects you from complicated runtime errors caused by C++ objects crossing sycl[d].dll boundary and not always handled properly by different versions of C++ RT used on app and sycl[d].dll sides.
  • Runtime exception like the following on Windows when the application is compiled in Debug mode. The workaround is to compile with /Od on the command line or add /Od to Linker>General>Pass additional options to device compilers in the IDE.
    Unhandled exception at 0x00007FF930ED7247 (igc64.dll) in gamma-correction.exe: 0xC0000005: Access violation reading location 0x0000027455B6C000.
  • Read the whitepaper on Challenges, tips, and known issues when debugging heterogenous programs using DPC++ or OpenMP offload.
  • A DPC++ system that has FPGAs installed does not support multi-process execution. Creating a context opens the device associated with the context and places a lock on it for that process. No other process may use that device. Some queries about the device through device.get_info<>() also opens up the device and locks it to that process since the runtime needs to query the actual device to obtain that information. The following are examples of queries that lock the device: 
    • is_endian_little 
    • global_mem_size 
    • local_mem_size 
    • max_constant_buffer_size
    • max_mem_alloc_size
    • vendor
    • name
    • is_available
  • When compiling for FPGA, if you declare kernel names in an unnamed namespace, the kernel name does not display properly in FPGA optimization reports, such as Summary, FMAX II Report, Area Analysis of System, Graph Viewer, Kernel Memory Viewer, and Schedule Viewer. To work around this issue, declare kernel names globally. 
  • When debugging FPGA emulator code on Windows* in Microsoft Visual Studio*, the debugger does not stop at breakpoints set in kernel code. There is no workaround available for this issue currently. 
  • The FPGA command aocl diagnose might report the ICD diagnostics FAILED error. You can safely ignore this error because if you installed the Intel® oneAPI Base Toolkit as directed, it means that the ICD is also installed correctly. If you have installed the Intel® FPGA Add-on for oneAPI Base Toolkit package successfully, you should not observe any compile or FPGA hardware run failures due to this error.
  • The Bottleneck Viewer in the FPGA optimization report appears blank without any data reported. To work around this issue, refer to the Loop Analysis report Details pane to identify bottlenecks.
  • When compiling for FPGA, if you declare very long kernel names, the compiler errors out. As a workaround, keep your kernel names shorter than 260 characters. 
  • When compiling for FPGA and creating device code archive on Windows, you might see the following link warnings:
    warning LNK4078: multiple '__CLANG_OFFLOAD_BUNDLE_SIZE__syc' sections found with different attributes (40100800)
    warning LNK4078: multiple '__CLANG_OFFLOAD_BUNDLE__sycl-fpg' sections found with different attributes (40100800)
    You can safely ignore these warnings.
  • The FPGA-specific flag -reuse-exe=<exe> is not supported on Windows. Refer to the fast_recompile FPGA tutorial for an example on how to separate host and device code to minimize compile time when you change only the host code. 

  • When compiling for FPGA and using a read-only accessor for a very wide struct, the compile-time to RTL (prior to the FPGA hardware image creation stage) can be large. As a workaround to address this long compile time, use a read-write accessor instead. 

  • When running a design compiled for the FPGA emulator, you might encounter the OpenCL API failed error message. OpenCL API returns -5 (CL_OUT_OF_RESOURCES) error. To work around this issue, increase the amount of memory to the value the emulator runtime is permitted to allocate (the default value is 512 KB) using the following commands, where <size> is an integer followed by KB for kilobytes (for example, 1024 KB) or MB for megabytes (for example, 32 MB): 

    • On Linux: export CL_CONFIG_CPU_FORCE_PRIVATE_MEM_SIZE=<size>

    • On Windows: set CL_CONFIG_CPU_FORCE_PRIVATE_MEM_SIZE=<size>

  • When the FPGA emulator runs out of memory on Windows (as described in the previous issue), the OpenCL API failed error message might not get generated sometimes, that is, the emulator run can silently fail. As a workaround, to ensure that this out-of-memory error is caught, write your kernel code using the try-catch syntax.

  • When compiling for FPGA, the compiler may sometimes ignore the ivdep attribute when it contains a pointer that is declared outside the scope where the attribute is applied. For example:
    int *p = ...
    // enter new scope
    {
      [[intel::ivdep(p)]] 
      for (int i = 0; i < N; i++) {
        // accesses to p
      }
    }

    In this example, the compiler might still find dependences on accesses to p in the loop despite the application of the ivdep attribute. To work around this limitation, declare the pointer (intended to be used in the ivdep attribute) within the same scope where the attribute is applied, if possible. This limitation does not result in functional errors on correctly written code but may affect performance on the generated hardware. 

  • When compiling for FPGA, you cannot use a system installed with Intel® FPGA PAC D5005 to compile a SYCL application that targets Intel® PAC with Intel® Arria® 10 FX FPGA. Compilation may succeed but the compiled binary might fail at runtime. There is no workaround available for this issue currently.

  • The software stack for Intel® PAC with Intel Arria® 10 GX FPGA and that for Intel® FPGA PAC D5005 are not compatible with each other on the same machine. If you have installed one of them already on a system, you must first uninstall it by running the aocl uninstall command before installing the other.

  • Compiles for the FPGA emulator might fail if you have also installed a GPU platform (oneAPI-specific). You might see the Error: Compiler Error: OpenCL kernel compile/link FAILED error message. To work around this error and to achieve a successful FPGA emulator compilation, perform one of the following solutions:

    • Solution 1: Add the -Xsfast-emulator flag to every dpcpp command when targeting the emulation flow.

    • Solution 2: Execute one of the OS-specific command listed in the following:

      • On Ubuntu 18.04: export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6

      • On Ubuntu 20.04: export LD_PRELOAD=/lib/x86_64-linux-gnu/libstdc++.so.6

      • On SLES 15: export LD_PRELOAD=/usr/lib64/libstdc++.so.6

      • On CentOS 8.x: export LD_PRELOAD=/lib64/libstdc++.so.6

      • On RHEL 8.1: export LD_PRELOAD=/lib64/libstdc++.so.6 

​     NOTE: Intel® recommends Solution 2 when working with code samples as it is inconvenient to modify the dpcpp command in the code sample CMake file.

Support Deprecated

-mkl compiler option replaced with -qmkl

The compiler option on Linux -mkl is deprecated and may be removed in a future release. In a future release, the replacement will be -qmkl. This compiler option tells the compiler to link to certain libraries in the Intel® oneAPI Math Kernel Library.

Notices and Disclaimers

Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets, and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from a course of performance, course of dealing, or usage in trade.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.