Intel® oneAPI DPC++/C++ Compiler Release Notes (Beta)

最后更新时间:08/06/2020

This document provides a summary of new and changed product features and includes notes about features and problems not described in the product documentation.

 

Where to Find the Release

Please follow the steps to download the toolkit from the Web Configurator, and follow the installation instructions to install.

OpenMP offload is available as "Intel® oneAPI DPC++/C++ Compiler Pro" in Intel® oneAPI HPC Toolkit.

New in 2021.1-beta10

New Features

Improvements

 

Bug Fixes

Known Issues

2021.1-beta09

Intel® oneAPI C++ Compiler(ICX)

New Features

  • Correction to reported problems
  • Improved OpenMP Offloading for C/C++ and Fortran
  • Improved Multi-GPU support with OpenMP
  • Improved OpenMP and DPC++ Composability
  • Added Intel managed USM API extensions (host/shared/device)

Known issues

  • Visual Studio IDE integration support for ICX is not availble with this release on windows.
  • Does not support linking library archives using the -l option for libraries that contain target offload code. More details and workaround for this issue can be found at Known Issue: Static Libraries and Target Offload.

  • When using the /std:c+latest option with the icx in a Windows environment, versions 16.3 through 16.5 of MS VS may be problematic due to issues in the MS VS standard headers files resulting from the introduction of support for the C+ 20 standard Concepts feature.
    One of the common error messages -

    MSVC/14.24.28314/include\concepts(33,20): error: expected ')
    __is_same(_Ty1 _Ty2)

Intel® oneAPI DPC++ Compiler

New Features

  • Implemented the following extensions in DPC++
  • Implemented aspects feature from the SYCL 2020 provisional Specification.
  • Added the following new compiler options
    • -f[no-]sycl-early-optimizations an option to enable/disable DPC++ early optimization before generation of SPIR-V code. This option is enabled by default.
    • -f[no-]sycl-id-queries-fit-in-int an option that tells the compiler to assume that SYCL ID queries fit within MAX_INT. This option is disabled by default.
  • Added sycl-ls utility to list all the platforms and devices available through the plugins
  • Added support for the following in FPGAs:
    • I/O pipes
    • no_global_work_offset attribute in parallel_for kernels
    • FPGA loop attributes: loop_coalesce, speculated_iterations, disable_loop_pipelining, and max_interleaving
    • force_pow2_depth memory attribute.
    • mem_channel buffer property on FPGAs.
    • Shared Counter Profiling.

Improvements

  • Added support for C array as a kernel parameter.
  • Added support for SYCL kernel inheritance and nested arrays.
  • Improved diagnostics for incorrect usage of APIs.
  • Added a diagnostic on attempt to use const static data members that are not const-initialized
  • The fallback implementation of standard library functions is now linked to the device code, only if such functions are used in kernels only
  • Added a diagnostic on attempt to capture this as a kernel parameter
  • Added [[intel::reqd_sub_group_size()]] attribute as a replacement for[[cl::reqd_sub_group_size()]] which is now deprecated
  • The sycl::usm_allocator has been improved. Now it has equality operators and can be used with std::allocate_shared. Disallowed usage with device allocations.
  • Added support for lambda functions passed to reductions
  • Enabled standard optimization pipeline for the device code by default. The new compiler flag can be used to disable optimizations at compile time:-fno-sycl-std-optimizations
  • Added support for braced-init-list or a number as range for sycl::queue::parallel_for family functions
  • Added 64-bit type support for to load and store methods of sycl::intel::sub_group
  • Finished implementation of the Host task with interop capabilities extension
  • Added support for specialization constants in Level 0 Backend
  • Improved performance of the SYCL graph cleanup [c099e47]
  • Added support for TriviallyCopyable types to the sycl::intel::sub_group::shuffle
  • Exceptions thrown in a host task will now be returned as asynchronous exceptions
  • Fixed sycl::buffer constructor which takes a contiguous container to enable copy back on destruction.
  • Added support for user-defined sub-group reductions
  • The sycl::backend::level0 has been renamed to sycl::backend::level_zero
  • Extended sycl::broadcast to support TriviallyCopyable types
  • Implemented get_native and make_* functions for Level Zero allowing to query native handles of SYCL objects and to create SYCL objects by providing a native handle: platform, device, queue, program. The feature is described in the SYCL 2020 provisional specification
  • Added support for sycl::intel::atomic_ref from SYCL_INTEL_extended_atomics extension

Bug Fixes

  • Fixed the issue with empty input for -foffload-static-lib option
  • Fixed a problem with template instantiation during integration header generation
  • Fixed a problem which could happen when using a command lines with large numbers of files
  • Fixed a crash when a kernel object field is an array of structures
  • Fixed issue which could prevent using of structures with constant-sized arrays as a kernel parameter
  • Fixed a bug in the pass for lowering hierarchical parallelism code (SYCLLowerWGScope). Transformation was generating the code where work items hit the barrier in the loop different number of times which is illegal
  • Fixed crash on attempt to use objects of sycl::experimental::spec_constant in the struct
  • Fixed a memory leak of sycl::event objects happened when using USM specific sycl::queue methods
  • Fixed race which could happen when submitting the same kernel from multiple threads
  • Fixed a memory leak of queue and context handles, which happened when backend is not OpenCL
  • Fixed endless-loop in sycl::intel::reduction for the data types not having fast atomics in case of local size is 1
  • Fixed a compilation error which happened when using sycl::interop_handle::get_native_mem method with an object of sycl::accessor created for host target
  • Fixed sycl::device::get_info<cl::sycl::info::device::sub_group_sizes> which was returning incorrect data
  • Fixed a warning message that was being emitted for all FPGA hardware compiles.
  • Fixed an error where select FPGA tutorial designs were crashing on Windows machine.
  • Fixed a crash that was occurring when using the Intel FPGA Emulation Platform on Windows system.
  • Fixed a warning message about lsb_release that was being displayed when you run the setvars.sh script.

Known Issues

  • DPC++ compiler does not support linking library archives using the -l option for libraries that contain target offload code. More details and workaround for this issue can be found at Known Issue: Static Libraries and Target Offload.
  • There is a known issue when using _ExtInt data type when using DPC++ FPGA Emulator that leads to a runtime failure.
  • On a Linux system, the compiler fails to generate the FPGA optimization report when there is space in the default Intel® oneAPI Base Toolkit installation directory path (for example, /opt/<directory_name_with_space>/) and emits Error: Unable to rewrite SYCL IR file error . To work around this issue, install the toolkit to a directory that has no space in its path.
  • There is a known runtime error when using code with unused array of pointers. As a work around, you can comment out any unused array of pointers
  • -g option is temporarily suppressing -fsycl-early-optimizations due to issues faced. These issues are expected to be addressed in the next release. To enforce enabling of sycl-early-optimizations with -g one can use “-g -fsycl-early-optimizations” options combination.
  • Kernels with sub groups when executed on CPU can cause an error message “Subgroup calls in scalar kernel or non-inlined subroutine can't be resolved!”. This is observed when compiling with -O0 flag or other optimizations that disable vectorization. Workaround is to avoid using those flags.
  • The SYCL library doesn't guarantee stable API/ABI, so applications compiled with older version of the SYCL library may not work with new one. The workaround is to rebuild the application.
  • Using cl::sycl::program API to refer to a kernel defined in another translation unit leads to undefined behavior
  • When running a design compiled for the FPGA emulator, you may encounter the OpenCL API failed. OpenCL API returns: -5 (CL_OUT_OF_RESOURCES) -5 (CL_OUT_OF_RESOURCES) error. To work around this issue, increase the amount of memory beyond the default of 512 KB, which the emulator runtime is permitted to allocate. In the following commands, <size> is an integer followed by KB for kilobytes (for example, 1024KB) or MB for megabytes (for example, 32MB):
    • On Linux: export CL_CONFIG_CPU_FORCE_PRIVATE_MEM_SIZE=<size>
    • On Windows: set CL_CONFIG_CPU_FORCE_PRIVATE_MEM_SIZE=<size>
  • On a Windows system, the FPGA library creation flow may fail when generating the emulation library. This failure occurs in the following situations:
    • When the Intel® Quartus Prime software is installed and available on the PATH environment variable.
    • When compiling a FPGA library from an OpenCL source.
      ​To work around this issue, temporarily remove all paths to the Intel® Quartus Prime software from your PATH environment variable and recompile the library.
  • Use of the ii, max_concurrency, or ivdep FPGA loop attributes in conjunction with any other loop attribute (other than with each other) for the same loop may result in a compiler crash. There is no workaround for this issue currently.
  • In FPGA emulation flow, when using an HLS or OpenCL source library, Error: Unable to rewrite SYCL IR file error could occur. There is no workaround available for this issue currently.
  • A DPC++ system that has FPGAs installed does not support multi-process execution. Creating a context opens the device associated with the context and places a lock on it for that process. No other process may use that device. Some queries about the device through device.get_info<>() also opens up the device and locks it to that process since the runtime needs to query the actual device to obtain that information. The following are examples of queries that lock the device:
    • is_endian_little
    • global_mem_size
    • local_mem_size
    • max_constant_buffer_size
    • max_mem_alloc_size
    • vendor
    • name
    • is_available
  • When compiling for FPGA, if you declare kernel names locally, the kernel name is displayed as const::kernel_name in FPGA optimization reports, such as, Summary, FMAX II Report, Area Analysis of System, Graph Viewer (beta), Kernel Memory Viewer, and Schedule Viewer (alpha). To work around this issue, declare kernel names globally.
  • When debugging FPGA emulator code on Windows* in Microsoft Visual Studio*, the debugger does not stop at breakpoints set in kernel code. There is no workaround available for this issue currently.
  • When compiling for FPGA and using a device static library by either passing the library archive directly or using the -foffload-static-lib=<library archive>, FPGA optimization reports do not display the source code of your library. To work around this issue, use objects instead of static library archives.
  • Use of the ivdep loop attribute can cause the compiler to assert in limited circumstances. This assert occurs when the ivdep attribute is applied to the following:
    As a work around, to avoid this assert, remove the ivdep attribute marked in each of the cases listed in the following. This removal should be safe to perform in these cases since the marked ivdep attributes have no functional effect.
    Case 1:
    [[intelfpga::ivdep]] // remove this attribute to avoid assert
    for (;;) { 
     // no array accesses
     ...
    }

    Case 2:

    [[intelfpga::ivdep]] // remove this attribute to avoid assert
    for(;;) {
     // no array accesses
     ...
     [[intelfpga::ivdep]]
     for(;;) {
       // some array accesses
          ...
      ; }
    }

 

2021.1-beta08

New Features

Improvements

  • Compiler driver name changed from dpcpp-cl.exe to dpcpp.exe on Windows*.
  • Improved handling of linker inputs for static lib processing.
  • The pragma spelling for SYCL-specific attributes except for cl::reqd_work_group_size are rejected now.
  • Added template parameter support for cl::intel_reqd_sub_group_size attribute.
  • Added support for struct members and pointers in intelfpga::ivdep attribute.
  • Dependency files are no longer generated by default when compiling using -fsycl -fintelfpga options.
  • Added support for USM vars and placeholder accessors passed to reduction version of sycl::handler::parallel_for.
  • Added support of sycl::intel::sub_group::load/store which take sycl::multi_ptr with sycl::access::address_space::local_space.
  • Added a cache for PI plugins, so subsequent calls for sycl::device creation be cheaper.
  • A SYCL program will be aborted now if program linking is requested when using L0 plugin. This is done because L0 doesn't support program linking.
  • Improved sycl::stream class implementation on the device side in order to reduce local memory consumption.

Bug Fixes

  • Fixed device code compile options passing which could lead to CL_INVALID_COMPILER_OPTIONS error.
  • Fixed a problem with creating a queue for FPGA device as a global inline variable.
  • Fixed an issue with functions marked as SYCL_EXTERNAL not participating in attribute propagation and conflicting attributes checking.
  • Fixed an issue which could lead to problems when a kernel name contains a CVR qualified type.
  • Fixed file processing when using -fsycl-link, now the generated object file can be linked by a non-SYCL enabled compiler/linker.
  • Fixed errors happened when using sycl::handler::copy with const void*, void* or a sycl::accessor for a type with const qualifier.
  • Fixed an issue with copying memory to itself during sycl::buffer copyback.
  • Fixed a possible deadlock that could happen when simultaneously submitting and waiting for kernels from multiple threads on Windows.
  • Fixed a problem that caused the device with a negative score to be still selected.
  • Fixed memleak which happened when using sycl::program::get_kernel.
  • Fixed errors which happened when using half or double types in reduction version of sycl::handler::parallel_for
  • A bunch of fixes to reduction version of sycl::handler::parallel_for:
    • Enabled operator*, operator+, operator|, operator&, operator^= for corresponding transparent functors used in reduction.
    • Fixed the case when reduction object is passed as an R-value.
    • Allowed identity-less constructors for reductions with transparent functors.
    • Replaced some auto declarations with Reduction::result_type and added intermediate assignments/casts to avoid type ambiguities caused by using sycl::half type, and which may also be caused by custom/user types as well.
    • Fixed compile-time known identity values for MIN and MAX reductions.
  • Fixed an error that was occurring when the sys_check.sh script was run after upgrading the FPGA Interface Manager (FIM) version for Intel® PAC with Intel® Arria® 10 GX FPGA.
  • Fixed a compiler crash issue relating to a problem in PickMemConfig or Pick Memory Configuration.

Known Issues

  • Calling sqrt/rsqrt without cl::sycl:: on Windows can lead to SPIR-V failure.
  • Using classes publicly inherited from std:tuple don’t get implicitly converted to std::tuple on Windows, leading to a Compilation error: No Matching function for call (candidate template ignored: failed template argument deduction)
  • If there is an attribute cl::intel_reqd_sub_group_size with the same value for kernel and function called from the kernel there can be compilation error.
  • When using __builtin_constant_p with gcc version >7, a compilation error "InvalidFunctionCall: Unexpected llvm intrinsic: llvm.is.constant.i64" is expected. Workaround is to downgrade to gcc 7 till issue is fixed in the next update release.
  • The format of the object files produced by the compiler can change between versions. The workaround is to rebuild the application.
  • The SYCL library doesn't guarantee stable API/ABI, so applications compiled with an older version of the SYCL library may not work with new one. The workaround is to rebuild the application.
  • Using cl::sycl::program API to refer to a kernel defined in another translation unit leads to undefined behavior.
  • A DPC++ system that has FPGAs installed does not support multi-process execution. Creating a context opens the device associated with the context and places a lock on it for that process. No other process may use that device. Some queries about the device through device.get_info<>() also opens up the device and locks it to that process since the runtime needs to query the actual device to obtain that information. The following are examples of queries that lock the device:
    • is_endian_little
    • global_mem_size
    • local_mem_size
    • max_constant_buffer_size
    • max_mem_alloc_size
    • vendor
    • name
    • is_available
  • All FPGA hardware compiles on Windows* emit the following warning:
    warning LNK4221: This object file does not define any previously undefined public symbols, so it will not be used by any link operation that consumes this library.
    Since any use of the resulting object file from a Windows* hardware compile is not supported yet, you can safely ignore this warning.
  • When compiling for FPGA, if you declare kernel names locally, the kernel name is displayed as const::kernel_name in FPGA optimization reports, such as, Summary, FMAX II Report, Area Analysis of System, Graph Viewer (beta), Kernel Memory Viewer, and Schedule Viewer (alpha). To work around this issue, declare kernel names globally.
  • When debugging FPGA emulator code on Windows* in Microsoft Visual Studio*, the debugger does not stop at breakpoints set in kernel code. There is no workaround available for this issue currently.
  • When compiling for FPGA and using a device static library by either passing the library archive directly or using the -foffload-static-lib=<library archive>, FPGA optimization reports do not display the source code of your library. To work around this issue, use objects instead of static library archives.
  • Select FPGA tutorial designs may occasionally crash on Windows machine. To work around this issue, rerun the design for successful completion of design execution.
  • The following crash can occur when using the Intel FPGA Emulation Platform for OpenCL on Windows system:
    Stack dump:
    0. Running pass 'CallGraph Pass Manager' on module 'main'
    This crash is caused by the use of high number of function calls in the design (more than 10,000). To work around this issue, reduce the number of function calls used in the design. Alternatively, you can use the Intel FPGA Emulation Platform for OpenCL on a Linux system.
  • Use of the ivdep loop attribute can cause the compiler to assert in limited circumstances. This assert occurs when the ivdep attribute is applied to the following:
    As a work around, to avoid this assert, remove the ivdep attribute marked in each of the cases listed in the following. This removal should be safe to perform in these cases since the marked ivdep attributes have no functional effect.
    Case 1:
    [[intelfpga::ivdep]] // remove this attribute to avoid assert
    for (;;) { 
     // no array accesses
     ...
    }

    Case 2:

    [[intelfpga::ivdep]] // remove this attribute to avoid assert
    for(;;) {
     // no array accesses
     ...
     [[intelfpga::ivdep]]
     for(;;) {
       // some array accesses
          ...
      ; }
    }
  • The FPGA environment setup script (setvars.sh) uses the lsb_release executable on Linux* operating systems. This executable may not exist on all valid Linux setups, preventing the FPGA environment setup from working correctly. This issue manifests in the following warning when you run the setvars.sh script: lsb_release: command not found
    To work around this issue, install the lsb_release executable. On CentOS* or RHEL*, run the yum install redhat-lsb-core command. On Ubuntu*, run the sudo apt-get install lsb-release command.
    Once this is installed, the setup script should work without issue.
  • When running a design compiled for the FPGA emulator, you may encounter the OpenCL API failed. OpenCL API returns: -5 (CL_OUT_OF_RESOURCES) -5 (CL_OUT_OF_RESOURCES) error. To work around this issue, increase the amount of memory beyond the default of 512 KB, which the emulator runtime is permitted to allocate. In the following commands, <size> is an integer followed by KB for kilobytes (for example, 1024KB) or MB for megabytes (for example, 32MB):

    • On Linux:

      export CL_CONFIG_CPU_FORCE_PRIVATE_MEM_SIZE=<size>
    • On Windows:

      set CL_CONFIG_CPU_FORCE_PRIVATE_MEM_SIZE=<size>

       

2021.1-beta07

New Features

  • Reduction extension for sycl::handler::parallel_for accepting a sycl::nd_range object.
  • New compiler flag [no-]device-math-lib=<arg>[,<arg>]to include device math libraries into the build, where <arg> is fp32 or fp64. For both 32 and 64, use -device-math-lib=fp32,fp64. The addition(s) of the device math libs are not on by default, so the option must be used to incorporate them into the build.
  • XPTI instrumentation to capture semantic and execution trace information for constructing the task graphs for offline graph and performance analysis.
  • Added support for static libraries in FPGA.
  • Added support for bank_bits FPGA memory attribute.
  • Added support for -Xshyper-optimized-handshaking FPGA optimization flag.
  • Combined the Intel PAC with Intel Arria 10 GX and Intel PAC with Stratix 10 SX FPGA add-on packages.
  • Added support for max_work_group_size, max_global_work_dim, and num_simd_work_items FPGA kernel attributes.
  • Added support for USM explicit on Intel PAC for Intel Stratix 10 SX.
  • Updated the Intel PAC with Intel Arria 10 GX to match the Intel Acceleration Stack for Intel Xeon CPU with FPGAs Version 1.2.1.

Improvements

  • Improved diagnostics for user errors and unsupported features.
  • Reduced possibility to load incorrect version of the OpenCL headers by reording default include paths.
  • Improved handling of AOCX based archives on Windows.
  • -g and -O0 options now imply -g and -cl-opt-disable for device compilation.
  • The std=c++17 option is now enabled by default.
  • Added support for kernel name types templated using c++ enums.
  • Now when -fintelfpga option is passed, the dependency file is created in the temporary files location instead of input source file location.
  • Improved handling of host accessors with the read-only type of access and avoid redundant memory copy operation.
  • Default selector doesn't select devices of accelerator type anymore.
  • Added support for 0-dim sycl::accessor in sycl::handler::copy
  • Added support for more image channel types for half4 data type on the host device
  • libsycl.so library is now versioned.

Bug Fixes

  • Fixed bug in hierarchical parallelism implementation related to using a private address of the parallel_for_work_group lambda object by all work items in the work group.
  • Fixed a crash that happened when a specialization constant is referenced twice.
  • Resolved a conflict with min/max macro that can be defined by windows.h.
  • Fixed `sycl::intel::sub_group::broadcast` which was incorrectly mapped to SPIRV intrinsic.
  • Fixed an issue when a sub-buffer was accessing incorrect memory.
  • Fixed a crash that could happen when host accessors are created in multiple threads.
  • Fixed an issue with copying to/from a pointer with a const qualifier when using sycl::handler::copy.
  • Fixed an issue in FPGA optimization reports where the source code was not getting displayed correctly.
  • Fixed an issue in FPGA optimization reports where a basic block containing few non-alphanumeric characters was not getting displayed in the Loop Analysis report.
  • Fixed an issue with the fpga_crossgen command, which used to fail when creating a SYCL target object.
  • Fixed an issue in the loop fusion, which did not work properly when compiling an FPGA design.
  • Fixed an issue in FPGA emulator compiles on Windows system with respect to the /Fo flag.

Known Issues

  • There is a known issue on Windows with trying to use clGetPlatformInfo and ClGetDeviceInfo when using a graphics driver older than 27.20.100.8280. If you run into this issue, please upgrade to the latest driver of at least version 27.20.100.8280 from the Download Center.
  • A crash can happen in a multithreaded application if two threads call an API which implies waiting for an event. No known workaround.
  • The format of the object files produced by the compiler can change between versions. The workaround is to rebuild the application.
  • The SYCL library doesn't guarantee stable API/ABI, so applications compiled with an older version of the SYCL library may not work with the new one. The workaround is to rebuild the application.
  • Using cl::sycl::program API to refer to a kernel defined in another translation unit leads to undefined behavior.
  • A DPC++ system that has FPGAs installed does not support multi-process execution. Creating a context opens the device associated with the context and places a lock on it for that process. No other process may use that device. Some queries about the device through device.get_info<>() also opens up the device and locks it to that process since the runtime needs to query the actual device to obtain that information. The following are examples of queries that lock the device:
    • is_endian_little
    • global_mem_size
    • local_mem_size
    • max_constant_buffer_size
    • max_mem_alloc_size
    • vendor
    • name
    • is_available
  • All FPGA hardware compiles on Windows* emit the following warning:
    warning LNK4221: This object file does not define any previously undefined public symbols, so it will not be used by any link operation that consumes this library.
    Since any use of the resulting object file from a Windows* hardware compile is not supported yet, you can safely ignore this warning.
  • When compiling for FPGA, if you declare kernel names locally, the kernel name is displayed as const::kernel_name in FPGA optimization reports, such as, Summary, FMAX II Report, Area Analysis of System, Graph Viewer (beta), Kernel Memory Viewer, and Schedule Viewer (alpha). To work around this issue, declare kernel names globally.
  • When debugging FPGA emulator code on Windows* in Microsoft Visual Studio*, the debugger does not stop at breakpoints set in kernel code. There is no workaround available for this issue currently.
  • When compiling for FPGA and using a device static library by either passing the library archive directly or using the -foffload-static-lib=<library archive>, FPGA optimization reports do not display the source code of your library. To work around this issue, use objects instead of static library archives.
  • When compiling for FPGA, the compiler may crash very rarely with a stack trace that points to a problem in PickMemConfig or Pick Memory Configuration. To work around this issue, rerun the compile.
  • Select FPGA tutorial designs may occasionally crash on the Windows machine. To work around this issue, rerun the design for the successful completion of design execution.
  • The following crash can occur when using the Intel FPGA Emulation Platform for OpenCL on Windows system:
    Stack dump:
    0. Running pass 'CallGraph Pass Manager' on module 'main'
    This crash is caused by the use of a high number of function calls in the design (more than 10,000). To work around this issue, reduce the number of function calls used in the design. Alternatively, you can use the Intel FPGA Emulation Platform for OpenCL on a Linux system.
  • Use of the ivdep loop attribute can cause the compiler to assert in limited circumstances. This assert occurs when the ivdep attribute is applied to the following:
    As a workaround, to avoid this assert, remove the ivdep attribute marked in each of the cases listed in the following. This removal should be safe to perform in these cases since the marked ivdep attributes have no functional effect.
    Case 1:
    [[intelfpga::ivdep]] // remove this attribute to avoid assert
    for (;;) { 
     // no array accesses
     ...
    }

    Case 2:

    [[intelfpga::ivdep]] // remove this attribute to avoid assert
    for(;;) {
     // no array accesses
     ...
     [[intelfpga::ivdep]]
     for(;;) {
       // some array accesses
          ...
      ; }
    }

     

  • Running the sys_check.sh script after upgrading the FPGA Interface Manager (FIM) version for Intel® PAC with Intel® Arria® 10 GX FPGA to the latest version results in the following error message:
    Error: Installed Intel(R) Programmable Acceleration Card has unsupported firmware installed.
    You can safely ignore this error message

2021.1-beta06

New Features

  • Implementation of GroupAlgorithms extension.
  • Partial implementation of sub group algorithms extension.
  • Support for intel::reqd_work_group_size attribute.
  • Support for Intel Stratix® 10 device family and custom platform for FPGA development.
  • Support for specialization constants feature which is based on SYCL Specialization Constant proposal.
  • DPC++ Compiler now uses Level-Zero(L0) Runtime for GPUs by default on Linux. More information on DPC++ Plugins for Level-Zero can be found in the Intel® oneAPI DPC++ Compiler Developer Guide and Reference.

Improvements

DPC++ Compiler:

  • Added a diagnostic on an attempt to declare or use the non-const static variable inside the device code.
  • Relaxed requirements for kernel types even more. Now by default, they should have a trivial copy constructor and trivial destructor.
  • Changed std::numeric_limits<sycl::half> to constexpr functions.
  • Added a diagnostic on attempt to use zero length arrays inside device code.
  • Added support for math functions fabs and ceil in device code.
  • Added a diagnostic (warning) on attempt to append new device object to an archive which already contains an AOT-compiled device object.
  • Added a diagnostic on attempt to use functions which have no definition in the Translation Unit and are not marked with SYCL_EXTERNAL macro inside device code.
  • Added a diagnostic on attempt to use thread local storage inside device code.
  • Removed arch designator from the default output file name when compiling with -fsycl-link option. Now an output file has just a flat name based on the first input file.
  • The SYCL headers were moved from lib/clang/11.0.0/include to include/sycl to support mixed compilers.
  • Added support for the GCC style inline assembly in the device code.
  • Improved fat static library support: the driver now considers static libraries which are passed on the command line as well as libraries passed as part of the linker options for offloading. This effectively negates the need to use -foffload-static-lib and -foffload-whole-static-lib options which are deprecated now.
  • The SYCL_EXTERNAL macro is now allowed to be used with class member functions .
  • Set aux-target-cpu for the device compilation which sets AVX and other necessary macro based on a target.

DPC++ Runtime:

  • Changed sycl::context and sycl::queue constructors to be explicit to avoid unintended conversions.
  • Added a diagnostic on setting SYCL_DEVICE_TYPE environment variable to an incorrect .
  • Improved error codes which are encoded in the SYCL exceptions.
  • Removed functions that use float type in the fallback library for fp64 complex.
  • Added support for RESTRICT_WRITE_ACCESS_TO_CONSTANT_PTR macro which allows to enable diagnostic on writing to a raw pointer obtained from a sycl::constant_ptr object.
  • Improved handling of host accessors with read-only type of access. Now they do not trigger redundant memory copy operation.

Bug Fixes

DPC++ Compiler:

  • Fixed a problem with compiler not being able to find a dependency file when compiling AOT to an object for FPGA.
  • Fixed a problem with host object not being added to the partial link step when compiling from source and using -foffload-static-lib option.
  • Reversed reqd_work_group_size attribute to match SYCL behavior.
  • Fixed dependency output location when /Fo<dir> is given.
  • Fixed a crash which happened when no kernel name is passed to the sycl::handler::parallel_for.
  • Fixed an FPGA report flow error where -o or /Fo option was disregarded.

DPC++ Runtime:

  • Fixed sycl::queue::wait() which was not waiting for event associated with USM operation.
  • Fixed problem with reporting wrong error message on the second attempt to build program if the first attempt failed.
  • Fixed an issue which could happen when sycl::event::wait is called from multiple threads.
  • Aligned sub_group::store signature between host and device.
  • Fixed sycl::program::get_compile_options and sycl::program::get_build_options to return correct values
  • Fixed sycl::multi_ptr's methods that were incorrectly enabled/disabled on device/host.
  • Fixed incorrect dependency handling when creating sub-buffers which could lead to data races.
  • Reversed reported max work-group size for a device to align with work-group size reversing before kernels launch.
  • Fixed incorrect handling of kernels that use hierarchical parallelism when -O0 option is passed to the clang.
  • Changed names of SYCL internal variables to avoid conflict with commonly used macros: SUCCESS, BLOCKED and FAILED.
  • Fixed a bug when a host device was always included in the device list returned by sycl::device::get_devices.
  • Fixed a problem with passing sycl::vec object to sycl::group::async_work_group_copy.
  • Fixed behavior of sycl::malloc_shared to return nullptr for the allocation size of zero or less byte, and the behavior of sycl::free functions to ignore the deallocation request from nullptr.
  • Fixed a possible problem with selecting work-group size which is bigger than max allowed work-group.
  • Fixed an issue which causes errors when using sub-buffers.
  • Changed the implementation of the buffer constructor from a pair of iterators. Now, data is not written back to the host on destruction of the buffer unless the buffer has a valid non-null pointer specified via the member function set_final_data.
  • Fixed a problem with incorrect acceptance of a lambda which takes an argument of the sycl::id type in the sycl::handler::parallel_for version which takes a sycl::ndrange object.
  • Resolved circular dependency between sycl::event and sycl::queue

Known Issues

  • Error “pi_throw: L0 Error” when running programs built with beta06.

    A compatibility issue has been discovered between then Intel oneAPI DPC++ compiler in the oneAPI beta06 release and the latest version of the GPU software that is available at https://repositories.intel.com/graphics repository.
    This issue is planned to be resolved in the beta07 update.
    In the meantime, if you have already updated your system to the latest version of the GPU software and want to continue to use the oneAPI beta06 version of the DPC++ compiler, there are two possible work arounds.

    • Switch to OpenCL backend by setting the environment variable SYCL_BE=PI_OPENCL.

    • Revert back to an earlier version of the GPU software stack with following commands on Ubuntu*:

      1. Open the Installation Guide for Intel® oneAPI Toolkits

      2. Go to section 6.1 and follow the instruction to set the repository for apt

      3. Use the following install command to install the older GPU driver

sudo apt install intel-level-zero-gpu=0.8.16259 \
  intel-gmmlib=19.4.1 \
  intel-igc-opencl=1.0.3586 \
  intel-igc-core=1.0.3586
sudo apt-mark hold intel-level-zero-gpu
  • DPC++ Runtime may crash at cl::sycl::platform::get_platforms() lookup in the presence of L0 runtime on the following systems:
    • Older systems (before Gen9): The workaround is to NOT install L0 drivers on such systems. This issue will be addressed in the following GPU driver releases.
    • Red Hat Enterprise Linux* 7 systems: In addition to the above workaround, remove PI L0 plugin (libpi_level0.so) from <install dir>/compiler/2021.1-beta06/linux/lib/
  • The format of the object files produced by the compiler can change between versions. The workaround is to rebuild the application.
  • The SYCL library doesn't guarantee stable API/ABI, so applications compiled with older version of the SYCL library may not work with new one. The workaround is to rebuild the application.
  • Using cl::sycl::program API to refer to a kernel defined in another translation unit leads to undefined behavior
  • Linkage errors with the following message can happen when a SYCL application is built using Microsoft Visual Studio* 2019 version below 16.3.0.

    error: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined.

    For Visual Studio version having the error the workaround is to use -std=c++17 switch.

  • All FPGA hardware compiles on Windows* emit the following warning:

    warning LNK4221: This object file does not define any previously undefined public symbols, so it will not be used by any link operation that consumes this library.

    Since any use of the resulting object file from a Windows* hardware compile is not supported yet, you can safely ignore this warning.

  • A DPC++ system that has FPGAs installed does not support multi-process execution. Creating a context opens the device associated with the context and places a lock on it for that process. No other process may use that device. Some queries about the device through device.get_info<>() also opens up the device and locks it to that process since the runtime needs to query the actual device to obtain that information. The following are examples of queries that lock the device:
    • is_endian_little
    • global_mem_size
    • local_mem_size
    • max_constant_buffer_size
    • max_mem_alloc_size
    • vendor
    • name
    • is_available
  • When compiling for FPGA, if your source files are not in the current working directory when you compile, then FPGA optimization reports do not display your source code. To work around this issue, run your compile from the directory containing your source code.
  • When compiling for FPGA, if you declare kernel names locally, the kernel name is displayed as const::kernel_name in FPGA optimization reports, such as, Summary, FMAX II Report, Area Analysis of System, Graph Viewer (beta), Kernel Memory Viewer, and Schedule Viewer (alpha). To work around this issue, declare kernel names globally.
  • When debugging FPGA emulator code on Windows* in Microsoft Visual Studio*, the debugger does not stop at breakpoints set in kernel code. There is no workaround available for this issue currently.
  • When compiling for FPGA and using the offload static library flow (-foffload-static-lib=<library archive>), FPGA optimization reports cannot display the source code of your library or the host. To work around this issue, use objects instead of static library archives.
  • Use of the ivdep loop attribute can cause the compiler to assert in limited circumstances. This assert occurs when the ivdep attribute is applied to the following:

    As a work around, to avoid this assert, remove the ivdep attribute marked in each of the cases listed in the following. This removal should be safe to perform in these cases since the marked ivdep attributes have no functional effect.

    Case 1:

    [[intelfpga::ivdep]] // remove this attribute to avoid assert
    for (;;) { 
     // no array accesses
     ...
    }

    Case 2:

    [[intelfpga::ivdep]] // remove this attribute to avoid assert
    for(;;) {
     // no array accesses
     ...
     [[intelfpga::ivdep]]
     for(;;) {
       // some array accesses
          ...
      ; }
    }

     

    1. A single unnested loop that does not contain array accesses.
    2. An outer loop that does not contain array accesses, and an inner loop in the same loop nest that does contain array accesses.
  • In FPGA optimization reports, for any basic block that contains non-alphanumeric character (except. and _ characters) in its name, Scheduled fMAX, Latency and Max Interleaving Iteration values are not displayed in the Loop Analysis report. As a work around, use the deprecated FMAX II report.

  • For FPGA emulator compiles on Windows system, the /Fo flag does not work unless you provide the complete name of the object file. As a work around for this issue, if you use Visual Studio to compile a design that is not an FPGA sample, you must either clear the Object File Name entry in the DPC++ > Output File tab of the Project Properties or add the object file name after $(InstDir) (this option does not work if a project has multiple input files)

  • The fpga_crossgen command fails when creating a SYCL target object from HLS, OpenCL, or RTL sources. To work around this issue, manually set the following: export PATH="$INTELFPGAOCLSDKROOT/llvm/bin:$PATH"

  • Loop fusion may cause the compiler to crash when one or more loops are contained within conditional statements, and the value of the condition is computed in the previous loop.
    For example:

    bool repeated;
    for (int i = 0; i < N; ++i) {
        if (i+1 < N) {
            repeated = data[i] == data[i+1];
        }
    }
    if (repeated)
    {
        for (int i = 0; i < N; ++i)
        {
            int val = data[i];
            data2[val][i] += value[i];
        }
    }

    To work around this issue, disable the automatic loop fusion by including -Xsopt-arg -Xsdisable-auto-loop-fusion argument when compiling the design.

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

产品和性能信息

1

英特尔的编译器针对非英特尔微处理器的优化程度可能与英特尔微处理器相同(或不同)。这些优化包括 SSE2、SSE3 和 SSSE3 指令集和其他优化。对于在非英特尔制造的微处理器上进行的优化,英特尔不对相应的可用性、功能或有效性提供担保。该产品中依赖于微处理器的优化仅适用于英特尔微处理器。某些非特定于英特尔微架构的优化保留用于英特尔微处理器。关于此通知涵盖的特定指令集的更多信息,请参阅适用产品的用户指南和参考指南。

通知版本 #20110804