Intel® oneAPI Base Toolkit Release Notes

By Jennifer L Jiang,

Published:07/10/2019   Last Updated:12/07/2020

Intel® oneAPI Base Toolkit supports direct programming and API programming, and delivers a unified language and libraries that offer full native code support across a range of hardware including Intel® and compatible processors, Intel® Processor Graphics Gen9, Gen11, Intel® Iris® Xe MAX graphics, and Intel® Arria® 10 or Intel® Stratix® 10 SX FPGAs. It offers direct programming model as well as API-based programming model, and it also contains analysis & debug tools for development and performance tuning.

Major Features Supported

New in 2021.1 Product Release

Key features at toolkit level

  • Support 3 platforms: Linux*, Windows*, and macOS*; but available products for each platform are different.
  • Please follow the Installation Guide of oneAPI Toolkit to install the latest GPU driver for your operating system.
  • The default installation path is following. We recommend to uninstall any previous beta releases first before installing this product release.
    • Linux or macOS: /opt/intel/oneapi
    • Windows: C:\Program Files (x86)\Intel\oneAPI
  • If you have been using the oneAPI Base Toolkit beta release or Intel® Parallel Studio or Intel® System Studio for your application, to move to the oneAPI Base Toolkit product release please rebuild your whole application with it.
  • For Linux and Windows the Intel® oneAPI DPC++/C++ Compiler is included that includes a C/C++ compiler (icx) and a DPC++ compiler (dpcpp).
  • For device offload code the Level-0 runtimes is the default backend. Following the instructions here to change the backend to OpenCL* if needed. Not all the library APIs or products support both Level-0 and OpenCL backend. Please read the product level Release Notes and documentation for details.
  • Intel® oneAPI Base Toolkits support co-existence or side-by-side installation with Intel® Parallel Studio XE or Intel® System Studio on Linux*, Windows* and macOS*.
  • Support of YUM and APT distribution for oneAPI Toolkits distribution and additional support of distribution channels for performance libraries in Conda, PIP, and NuGet.

Intel® oneAPI DPC++/C++ Compiler

  • Intel® oneAPI DPC++ Compiler
  • Intel® C++ Compiler: 
    • Clang and LLVM based compiler with driver name "icx"
    • OpenMP 4.5 and Subset of OpenMP 5.0 with offloading support
    • Vectorization and Loop Optimizations

Intel® oneAPI DPC++ Library (oneDPL)

  • Support  the oneDPL Specification_ v1.0, including parallel algorithms, DPC++ execution policies, special iterators, and other utilities.
  • oneDPL algorithms can work with data in DPC++ buffers as well as in unified shared memory (USM).
  • A subset of the standard C++ libraries is supported in DPC++ kernels, including "<array>", "<complex>", "<functional>", "<tuple>", "<utility>" and other standard library API
  • Standard C++ random number generators and distributions for use in DPC++ kernels.

Intel® DPC++ Compatibility Tool 

  • Support for migration of CUDA* kernels, host and device API calls (for example, memory management, events, math, etc..) and library calls (cuBLAS, cuSPARSE, cuSolver, cuRand, Nvidia* Thrust*). Typically, 80%-90% of CUDA code migrates to DPC++ code by the tool.
  • Warning messages are emitted to command line output and inlined into the generated code, when the code requires manual work to help you finish the application.
  • Integration with Visual Studio* 2017 and 2019 on Windows and Eclipse* on Linux provides enhanced migration usability.

Intel® oneAPI Math Kernel Library (oneMKL)

  • With this release, the product previously known as the Intel® Math Kernel Library becomes the Intel® oneAPI Math Kernel Library (oneMKL).
  • Added support for following programming models: Data Parallel C++ (DPC++) API’s support programming for both the CPU and Intel GPUs, and C/Fortran OpenMP Offload interfaces to program Intel GPUs.
  • Introduced Unified Shared Memory (USM) support for Intel Processor Graphics and Xe architecture-based graphics.

Intel® oneAPI Threading Building Blocks (oneTBB)

  • Changes affecting backward compatibility
  • New features:
    • Concurrent ordered containers, task_arena interface extension for NUMA, flow graph API to support relative priorities for functional nodes and resumable tasks are fully supported now.
    • Implemented task_arena interface extension to specify priority of the arena.

Intel® Distribution for GDB*

  • Supports debugging kernels offloaded to the CPU, GPU and FPGA-emulation devices.
  • Automatically attaches to the GPU device to listen to debug events.
  • Automatically detects JIT-compiled, or dynamically loaded, kernel code for debugging.
  • Supports DPC++, C++ OpenMP offload debugging, and OpenCL.
  • Provides ability to list active SIMD lanes and switching the current SIMD lane context per thread

Intel® Integrated Performance Primitives (Intel IPP)

  • CPU only support
  • Extended optimization for Intel® IPP Cryptography cipher AES, RSA support on 10th Generation Intel® Core™ processor family.
  • Added new universal CRC function to compute CRC8, CRC16, CRC24, CRC32 checksums

Intel® oneAPI Collective Communications Library (oneCCL)

  • Enables efficient implementations of collectives used for deep learning training (allgatherv, allreduce, alltoall(v), broadcast, reduce, reduce_scatter)
  • Provides C++ API and interoperability with DPC++
  • Deep Learning Optimizations include:
    • Asynchronous progress for compute communication overlap
    • Dedication of cores to ensure optimal network use
    • Message prioritization, persistence, and out-of-order execution
    • Collectives in low-precision data types (int[8,16,32,64], fp[32,64], bf16)
  • Linux* OS support only

Intel® oneAPI Data Analytics Library (oneDAL)

  • Renamed the library from Intel® Data Analytics Acceleration Library to oneAPI Data Analytics Library and changed the package names to reflect this.
  • Deprecated 32-bit version of the library.
  • Introduced Intel GPU support for both OpenCL and Level Zero backends.
  • Aligned the library with oneDAL Specification 1.0 for the following algorithms on both CPU/GPU:
  • K-means, PCA, Random Forest Classification and Regression, kNN and SVM
  • Introduced new Intel® DAAL and daal4py functionality on GPU :
    • Batch algorithms: K-means, Covariance, PCA, Logistic Regression, Linear Regression, Random Forest Classification and Regression, Gradient Boosting Classification and Regression, kNN, SVM, DBSCAN and Low-order moments
    • Online algorithms: Covariance, PCA, Linear Regression and Low-order moments
    • Added Data Management functionality to support DPC++ APIs: a new table type for representation of SYCL-based numeric tables (SyclNumericTable) and an optimized CSV data source
    • Added Technical Preview Features in Graph Analytics on CPU - Jaccard Similarity Coefficients

Intel® oneAPI Deep Neural Networks Library (oneDNN)

  • Introduced SYCL* API extensions compliant with oneAPI specification v1.0.
  • Introduced support for Intel(R) DPC++ Compiler and Level Zero runtime.
  • Introduced Unified Shared Memory (USM) support for Intel Processor Graphics and Xe architecture-based graphics.

Intel® oneAPI Video Processing Library (oneVPL)

  • AVC/H.264, HEVC/H.265, MJPEG, and AV1 software decode and encode
  • Video processing (resize, color conversion, and crop)
  • Frame memory management with user interface and internally allocated buffers
  • DPC++ kernel integration

Intel® Distribution for Python*

  • Machine Learning: XGBoost 1.2 with new CPU optimizations, and new Scikit-learn and daal4py optimizations including Random Forest Classification/Regression, kNN, sparse K-means, DBSCAN, SVM, SVC, Random Forest, Logistic Regression, and more.
  • Initial GPU support: GPU-enabled Data Parallel NumPy* (dpnp); DPCTL, a new Python package for device, queue, and USM data management with initial support in dpnp, scikit-learn, daal4py, and numba; daal4py optimizations for GPU; and GPU support in scikit-learn for DBSCAN, K-Means, Linear Regression and Logistic Regression.
  • Intel® Scalable Dataframe Compiler (Intel® SDC) Beta – Numba extension for accelerating Pandas*

Intel® Advisor

  • Offload Advisor: Get your code ready for efficient GPU offload even before you have the hardware. Identify offload opportunities, quantify potential speedup, locate bottlenecks, estimate data transfer costs, and get guidance on how to optimize.  
  • Automated Roofline Analysis for GPUs: Visualize actual performance of GPU kernels against hardware-imposed performance limitations and get recommendations for effective memory vs. compute optimization.
  • Memory-level Roofline Analysis: Pinpoint exact memory hierarchy bottlenecks (L1, L2, L3 or DRAM). 
  • Flow Graph Analyzer support for DPC++: Visualize asynchronous task graphs, diagnose performance issues, and get recommendations to fix them. 
  • Intuitive User Interface: New interface workflows and toolbars incorporate Roofline Analysis for GPUs and Offload Advisor.
  • Intel® Iris® Xe MAX graphics support: Roofline analysis and Offload Advisor now supports Intel® Iris® Xe MAX graphics.

Intel® VTune™ Profiler

  • Find performance degrading memory transfers with offload cost profiling for both DPC++ and OpenMP.
  • Debug throttling issues and tune flops/watt using power analysis.
  • Find the module causing performance killing I/O writes using improved I/O analysis that identifies where slow MMIO writes are made.
  • Less guessing is needed when optimizing FPGA software performance as developers can now get stall and data transfer data for each compute unit in the FPGA.
  • A new Performance Snapshot is the first profiling step. It suggests the detailed analyses (memory, threading, etc.) that offer the most optimization opportunities.

Intel® FPGA Add-On for oneAPI Base Toolkit (Optional)

  • Support for installing the Intel® FPGA Add-On for oneAPI Base Toolkit via Linux package managers (YUM, APT, and Zypper).
  • Support three FPGA boards (including Intel® PAC with Intel® Arria® 10 GX, Intel® FPGA PAC D5005, and custom platform) with four add-on installers.

System Requirements

Please see Intel oneAPI Base Toolkit System Requirements

Installation Instructions

Please visit Installation Guide for Intel oneAPI Toolkits

How to Start Using the Tools

Please reference:

Known Issues, Limitations and Workarounds

  1. Please read the whitepaper on Challenges, tips, and known issues when debugging heterogenous programs using DPC++ or OpenMP offload 
  2. Limitations
    1. Running any GPU code on a Virtual Machine is not supported at this time.
    2. If you have chosen to download the Get Started Guide to use offline, viewing it in Chrome may cause the text to disappear when the browser window is resized. To fix this problem, resize your browser window again, or use a different browser.
    3. Eclipse* 4.12: the code sample project created by IDE plugin from Makefile will not build. It is a known issue with Eclipse 4.12. Please use Eclipse 4.9, 4.10 or 4.11.
  3. Known issue -  regarding namespace "oneapi" conflicting with older compilers - error: reference to 'tbb' is ambiguous

    • This issue is only found with the following compilers:

      1. GNU* gcc 7.x or older
      2. LLVM* Clang 3.7 or older
      3. Intel® C++ Compiler 19.0 or older
      4. Visual Studio 2017 version 15.6 or older 
    • If your code uses the namespace in the following manner and one of the compilers above, you may get the compilation errors like "error: reference to 'tbb' is ambiguous".

      The "using namespace oneapi;" directive in a oneDPL|oneDNN|oneTBB program code may result in compilation errors with the compilers listed above.

      test_tbb.cpp:

      namespace tbb { int bar(); }
      namespace oneapi { namespace tbb = ::tbb; }
      
      using namespace oneapi;
      int zoo0() { return tbb::bar(); }

      Compiling: .

      test_tbb.cpp: In function 'int zoo0()':
      test_tbb.cpp:5:21: error: reference to 'tbb' is ambiguous
      int zoo0() { return tbb::bar(); }

      Workarounds: 

      Instead of the directive "using namespace oneapi;", please use full qualified names or namespace aliases.

      test_tbb_workaround.cpp: 

      namespace tbb { int bar(); }
      namespace oneapi { namespace tbb = ::tbb; }
      
      // using namespace oneapi;
      int zoo0() { return tbb::bar(); }

      Additional Notes: 

      The "using namespace oneapi;" directive is not recommended right now, as it may result in compilation errors when oneMKL| oneDAL| oneCCL is used with other oneAPI libraries. There're two workarounds:

      • Use the full qualified namespace like above
      • Use namespace alias for oneMKL| oneDAL| oneCCL, e.g.
        • namespace one[mkl|dal|ccl] = ::oneapi::[mkl|dal|ccl];
          onemkl::blas::dgemm( … ); | onedal::train(); | onccl::allgathersv();

           

    •  

  4. Known issue - the installation error on Windows "LoadLibrary failed with error 126: the specified module could not be found" in certain environment only

    • Impacted environment: Windows with AMD graphics card

    • Details: 

      When a Windows system has AMD* graphics cards or AMD Radeon Vega* graphics units, the installer of oneAPI Toolkits may report the error "LoadLibrary failed with error 126: the specified module could not be found". This has been reported and is being investigated. Please use the workaround for this release.
    • Workaround:

      Temporarily  disable the Intel® HD Graphics during the installation of oneAPI Toolkits with the steps below:

      Open the Device Manager > Display Adapters; Right click on the listed display (common is the intel integrated graphics accelerator) and select DISABLE.

  5. Known issue - "Debug Error!" from Microsoft Visual C++ Runtime Library
    • Impacted environment: Windows, "Debug" build only, mixed use of DPC++ & oneAPI libraries (except oneTBB)
    • Details: This error may occur only when the DPC++ program is built in "Debug" configuration and it uses one of the oneAPI libraries that do not have a dynamic debug libraries, e.g. oneVPL; The oneTBB is not impacted by this issue.
    • The error is similar to the following:

      Unable to start program

    • Workaround:
      • Use "Release" configuration to build the program for now.
  6. More limitations on Windows 
    • For users who have Visual Studio* 2017 or 2019 installed, the installation of IDE integrations of oneAPI Base Toolkit is very slow. It may take 30+minutes just for the IDE integrations installation sometimes. Please be extra patient and it will be eventualy installed. 
    • If you encounter a runtime error that "... ... sycl.dll was not found. ... ..." or similar like below

      Unable to start program

      when running your program within Visual Studio, please follow the instructions below to update the project property "Debugging > Environment" in order to run the program:
      • Open the project property "Debugging > Environment" property and click right drop-down & select Edit

        Unable to start program

      • Copy and paste the default PATH environment variable value from lower section to the above section.
        This step is very important because of how Visual Studio 2017 or newer handles the additional directory for the "PATH" environment variable.

        Unable to start program

      • Add any additional directories needed by the program for those dll files to the path like below

        Unable to start program

    • Error when running a code sample program within Visual Studio: unable to start program 'xxx.exe"

      Unable to start program


      Please follow the instructions below for the workaround.
      • Open Tools Options dialog, select Debugging tab, and select the check-box of "Automatically close the console when debugging stops". See the dialog image below for details.

        Unable to start program

Release Notes for All Tools included in Intel® oneAPI Base Toolkit

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.