Intel® oneAPI DPC++ Library (oneDPL) Release Notes

Version: 2021.3   Published: 11/03/2020   Last Updated: 06/28/2021

Where to Find the Release

Please follow the steps to download the toolkit from the Web Configurator, and follow the installation instructions.

Overview

The Intel® oneAPI DPC++ Library (oneDPL) accompanies the Intel® oneAPI DPC++/C++ Compiler and provides high-productivity APIs aimed to minimize programming efforts of C++ developers creating efficient heterogeneous applications.

2021.4.0

New Features

  • Added the range-based versions of the following algorithms: any_of, adjacent_find, copy_if, none_of , remove_copy_if, remove_copy, replace_copy, replace_copy_if, reverse, reverse_copy, rotate_copy, swap_ranges, unique, unique_copy.
  • Added new asynchronous algorithms: inclusive_scan_async, exclusive_scan_async, transform_inclusive_scan_async, transform_exclusive_scan_async.
  • Added structured binding support for zip_iterator::value_type.

Fixed Issues

  • Fixed an issue with asynchronous algorithms returning future<ptr> with unified shared memory (USM).

Known Issues and Limitations

New in this Release

  • With Intel® oneAPI DPC++/C++ Compiler, unseq and par_unseq execution policies do not use OpenMP SIMD pragmas due to compilation issues with the -fopenm-simd option, possibly resulting in suboptimal performance.
  • The oneapi::dpl::experimental::ranges::reverse algorithm does not compile with -fno-sycl-unnamed-lambda option.

Existing Issues

  • exclusive_scan and transform_exclusive_scan algorithms may provide wrong results with vector execution policies when building a program with GCC 10 and using -O0 option.
  • Some algorithms may hang when a program is built with -O0 option, executed on GPU devices and large number of elements is to be processed.
  • The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to compilation errors (caused by oneTBB API changes). To overcome these issues, include oneDPL header files before the standard C++ header files, or disable parallel algorithms support in the standard library. For more information, please see Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes.
  • The using namespace oneapi; directive in a oneDPL program code may result in compilation errors with some compilers including GCC 7 and earlier. Instead of this directive, explicitly use oneapi::dpl namespace, or create a namespace alias.
  • The implementation does not yet provide namespace oneapi::std as defined in the oneDPL Specification.
  • The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher) or Clang 7 (or higher).
  • std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
  • When used within DPC++ kernels or transferred to/from a device, std::array can only hold objects whose type meets DPC++ requirements for use in kernels and for data transfer, respectively.
  • std::array::at member function cannot be used in kernels because it may throw an exception; use std::array::operator[] instead.
  • std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library.
  • Due to specifics of Microsoft* Visual C++, some standard floating-point math functions (including std::ldexp, std::frexp, std::sqrt(std::complex<float>)) require device support for double precision.

2021.3.0

New Features

  • Added the range-based versions of the following algorithms: all_of, any_of, count, count_if, equal, move, remove, remove_if, replace, replace_if.
  • Added the following utility ranges (views): generatefillrotate.

Changes to Existing Features

  • Improved performance of discard_block_engine (including ranlux24, ranlux48, ranlux24_vec, ranlux48_vec predefined engines) and normal_distribution.
  • Added two constructors to transform_iterator: the default constructor and a constructor from an iterator without a transformation. transform_iterator constructed these ways uses transformation functor of type passed in template arguments.
  • transform_iterator can now work on top of forward iterators.

Fixed Issues

  • Fixed execution of swap_ranges algorithm with unseq, par execution policies.
  • Fixed an issue causing memory corruption and double freeing in scan-based algorithms compiled with -O0 and -g options and run on CPU devices.
  • Fixed incorrect behavior in the exclusive_scan algorithm that occurred when the input and ouput iterator ranges overlapped.
  • Fixed error propagation for async runtime exceptions by consistently calling sycl::event::wait_and_throw internally.
  • Fixed the warning: local variable will be copied despite being returned by name [-Wreturn-std-move].

Known Issues and Limitations

  • No new issues in this release.

Existing Issues

  • exclusive_scan and transform_exclusive_scan algorithms may provide wrong results with vector execution policies when building a program with GCC 10 and using -O0 option.
  • Some algorithms may hang when a program is built with -O0 option, executed on GPU devices and large number of elements is to be processed.
  • The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to compilation errors (caused by oneTBB API changes). To overcome these issues, include oneDPL header files before the standard C++ header files, or disable parallel algorithms support in the standard library. For more information, please see Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes.
  • The using namespace oneapi; directive in a oneDPL program code may result in compilation errors with some compilers including GCC 7 and earlier. Instead of this directive, explicitly use oneapi::dpl namespace, or create a namespace alias.
  • The implementation does not yet provide namespace oneapi::std as defined in the oneDPL Specification.
  • The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher) or Clang 7 (or higher).
  • std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
  • When used within DPC++ kernels or transferred to/from a device, std::array can only hold objects whose type meets DPC++ requirements for use in kernels and for data transfer, respectively.
  • std::array::at member function cannot be used in kernels because it may throw an exception; use std::array::operator[] instead.
  • std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library.
  • Due to specifics of Microsoft* Visual C++, some standard floating-point math functions (including std::ldexp, std::frexp, std::sqrt(std::complex<float>)) require device support for double precision.

2021.2.0

New Features

  • Added support of parallel, vector and DPC++ execution policies for the following algorithms: shift_left, shift_right.
  • Added the Range-based versions of the following algorithms: sort, stable_sort, merge.
  • Added experimental asynchronous algorithms: copy_async, fill_async, for_each_async, reduce_async, sort_async, transform_async, transform_reduce_async. These algorithms are declared in oneapi::dpl::experimental namespace and implemented only for DPC++ policies. In order to make these algorithms available the  <oneapi/dpl/async>  header should be included. Use of the asynchronous API requires C++11.
  • Utility function wait_for_all enables waiting for completion of an arbitrary number of events.
  • Added the ONEDPL_USE_PREDEFINED_POLICIES macro, which enables predefined policy objects and make_device_policy, make_fpga_policy functions without arguments. It is turned on by default.

Changes to Existing Features

  • Improved performance of the following algorithms: count, count_if, is_partitioned, lexicographical_compare, max_element, min_element, minmax_element, reduce, transform_reduce, and sort, stable_sort when using Radix sort.
    Note: The sorting algorithms in oneDPL use Radix sort for arithmetic data types compared with std::less or std::greater, otherwise Merge sort.
  • Improved performance of the linear_congruential_engine RNG engine (including minstd_rand, minstd_rand0, minstd_rand_vec, minstd_rand0_vec predefined engines).

Fixed Issues

  • Fixed runtime errors occurring with find_end, search, search_n algorithms when a program is built with -O0 option and executed on CPU devices.
  • Fixed the majority of unused parameter warnings.

Known Issues and Limitations

  • exclusive_scan and transform_exclusive_scan algorithms may provide wrong results with vector execution policies when building a program with GCC 10 and using -O0 option.
  • Some algorithms may hang when a program is built with -O0 option, executed on GPU devices and large number of elements is to be processed.
  • The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to compilation errors (caused by oneTBB API changes). To overcome these issues, include oneDPL header files before the standard C++ header files, or disable parallel algorithms support in the standard library. For more information, please see Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes.
  • The using namespace oneapi; directive in a oneDPL program code may result in compilation errors with some compilers including GCC 7 and earlier. Instead of this directive, explicitly use oneapi::dpl namespace, or create a namespace alias.
  • The implementation does not yet provide namespace oneapi::std as defined in the oneDPL Specification.
  • The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher) or Clang 7 (or higher).
  • std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
  • When used within DPC++ kernels or transferred to/from a device, std::array can only hold objects whose type meets DPC++ requirements for use in kernels and for data transfer, respectively.
  • std::array::at member function cannot be used in kernels because it may throw an exception; use std::array::operator[] instead.
  • std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library.
  • Due to specifics of Microsoft* Visual C++, some standard floating-point math functions (including std::ldexp, std::frexp, std::sqrt(std::complex<float>)) require device support for double precision.

2021.1.1

Key Features

  • oneDPL implements the oneDPL Specification v1.0, including parallel algorithms, DPC++ execution policies, special iterators, and other utilities.

  • oneDPL algorithms can work with data in DPC++ buffers as well as in unified shared memory (USM).
  • For several algorithms, experimental API that accepts ranges (similar to C++20) is additionally provided.
  • A subset of the standard C++ libraries for Microsoft* Visual C++, GCC, and Clang is supported in DPC++ kernels, including <array>, <complex>, <functional>, <tuple>, <type_traits>, <utility> and other standard library API. For the detailed list, please refer to the oneDPL User Guide.
  • Standard C++ random number generators and distributions for use in DPC++ kernels.

Known Issues and Limitations

  • The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to compilation errors (caused by oneTBB API changes). To overcome these issues, include oneDPL header files before the standard C++ header files, or disable parallel algorithms support in the standard library. For more information, please see Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes.
  • The using namespace oneapi; directive in a oneDPL program code may result in compilation errors with some compilers including GCC 7 and earlier. Instead of this directive, explicitly use oneapi::dpl namespace, or create a namespace alias.
  • The partial_sort_copy, sort and stable_sort algorithms are prone to CL_BUILD_PROGRAM_FAILURE when using Radix sort in debug mode on CPU devices. 
    Note: The sorting algorithms in oneDPL use Radix sort for arithmetic data types compared with std::less or std::greater, otherwise Merge sort.
  • The implementation does not yet provide namespace oneapi::std as defined in the oneDPL Specification.
  • The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher) or Clang 7 (or higher).
  • std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
  • When used within DPC++ kernels or transferred to/from a device, std::array can only hold objects whose type meets DPC++ requirements for use in kernels and for data transfer, respectively.
  • std::array::at member function cannot be used in kernels because it may throw an exception;  use std::array::operator[] instead.
  • std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library.
  • Due to specifics of Microsoft* Visual C++, some standard floating-point math functions (including std::ldexp, std::frexp, std::sqrt(std::complex<float>)) require device support for double precision

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.