Intel® oneAPI Math Kernel Library (oneMKL) Release Notes

By Khang T Nguyen, Abhinav Singh

Published:07/03/2019   Last Updated:12/04/2020

 

Where to Find the Release

Intel® oneAPI Math Kernel Library

New in This Release

 

2021.1 Initial Release

System Requirements   Bug Fix Log

Features

With this release, the product previously known as the Intel® Math Kernel Library(Intel® MKL) becomes the Intel® oneAPI Math Kernel Library (oneMKL).

Existing Intel® MKL product users can migrate to oneMKL with confidence knowing that Intel continues to provide support for the same C and Fortran APIs for CPUs as they have for years. 

oneMKL extends beyond traditional C and Fortran APIs with new support for two programming models to enable programing Intel GPUs: Data Parallel C++ (DPC++) APIs support programming for both the CPU and Intel GPUs, and C/Fortran OpenMP* Offload interfaces to program Intel GPUs.  

We have changed the versioning model for shared libraries; please refer to the developer guide for more details. 

The following table illustrates the general domain areas included in oneMKL and which areas have been provided under the new DPC++ and OpenMP* Offload programming models:

Domain CPU APIs Intel GPU APIs
  DPC++ C Fortran DPC++ C OpenMP* Offload Fortran OpenMP* Offload
BLAS and BLAS-like Extensions  Yes Yes Yes Yes Yes Yes
LAPACK and LAPACK-like Extensions Yes1 Yes Yes Yes1 Yes2 Yes2
ScaLAPACK No Yes Yes No No No
Vector Math Yes Yes Yes Yes5 Yes5 Yes3,5
Vector Statistics (Random Number Generators)  Yes1 Yes Yes Yes1 Yes2 Yes2,3
Vector Statistics (Summary Statistics) Yes1 Yes Yes Yes1 No No
Data Fitting No Yes Yes No No No
FFT/DFT Yes Yes Yes Yes Yes4 Yes4
Sparse BLAS Yes1 Yes Yes Yes1 Yes2 No
Sparse Solvers No Yes Yes No No No

1: Subset of the full functionality available. Refer to the DPC++ developer reference for full list of DPC++ functionality supported.  

2: Subset of the full functionality available. For the list of functionality, refer to the developer reference (C and Fortran

3: Supported on Linux* only. 

4: DFTI interfaces are supported; FFTW interfaces are not supported. 

5. Subset of the full functionality available. Refer to the DPC++ developer reference for full list of DPC++ functionality supported or to the developer reference for C and Fortran. Functions which are not implemented for GPU can still be used and will be executed transparently on the host CPU.

Performance Recommendation:

  • For DPC++ and OpenMP* offload on Windows*, use the OpenCL* runtime for the best performance of BLAS and LAPACK functionality.​ To enable the OpenCL* runtime, set the following environment variables: 
    • SYCL_BE=PI_OPENCL​ 
    • LIBOMPTARGET_PLUGIN=opencl

DPC++ Known Issues and Limitations

  • Dynamic linking on Windows* is supported only for the BLAS and LAPACK domains.
  • The custom dynamic builder tool does not support building custom dynamic libraries for the DPC++ interfaces. 
  • Device RNG routines should be used with the “-fno-sycl-early-optimizations” compilation flag on a CPU device.  
  • Discrete Fourier Transform (DFT) on Intel GPU and Intel® oneAPI Level Zero backend may result in incorrect results for large batch sizes. To run DFT on Intel GPU, set the environment variable SYCL_BE=PI_OPENCL. 
  • Real backward out-of-place DFT can produce incorrect results on Intel GPU. As a workaround, use the in-place transform. 
  • LU factorization (getrf) on Intel GPU may fail with an invalid argument error when used with an OpenCL* backend and in double precision. As a workaround, use the oneAPI Level Zero backend. 
  • Static linking on Windows* can take significant time (up to 10 minutes). Linking static libraries can lead to a large application size due to GPU kernels.  
  • USM APIs of Sparse BLAS only works with input arrays allocated by malloc_shared, so that the data is always accessible from host.
  • On Windows* DPC++ library has only Release version, and it can’t be used to build Debug version of DPC++ applications.
  • For DFT on GPU, user-defined strides with padding are not supported.

C/Fortan Known Issues and Limitations

  • OpenMP* offload is only supported for static libraries on Windows*. 
  • On Windows* (LAPACK (C only) and DFT domains), OpenMP* offload does not support Intel® oneAPI Level Zero backend and works only with the OpenCL* backend. To run OpenMP* offload, set the environment variable LIBOMPTARGET_PLUGIN=opencl.  
  • On Linux* (DFT domain), OpenMP* offload does not support oneAPI Level Zero backend and works only with the OpenCL* backend. To run OpenMP* offload, set the environment variable LIBOMPTARGET_PLUGIN=opencl. 
  • The custom dynamic builder tool does not support building custom dynamic libraries for OpenMP* offload C and Fortran interfaces. 
  • Intel® Fortran Compiler Classic (ifx) does not support the specific SYCL linking option (-fsycl-device-code-split), which may result in long execution times for first calls of SYCL-based functions. Also, functionality on DG1 may be affected – see the note about enabling double precision emulation below.
  • LU factorization (dgetrf) for OpenMP* offload may fail with an invalid argument error when used with an OpenCL* backend and in double precision. As a workaround, use the oneAPI Level Zero backend. 
  • Note that Intel® Fortran Compiler Classic 2021.1 (ifx) remains in beta release and does not yet support the full language. As such, some oneMKL Fortran examples may not compile with Intel® Fortran Compiler Classic 2021.1 (ifx).  
  • On Windows* 10 version 2004, the fmod function causes the floating point stack to be left in an incorrect state when called with the x parameter equal to zero. When certain oneMKL functions such as zgetrs are called at some point after the problematic fmod function call, the results returned may be incorrect (in particular, they may be NaNs).  If possible, avoid using this version of Windows* 10 or later until a fix is provided by Microsoft*.  Alternatively, after calling fmod, the floating point stack may be cleared by calling the emms instruction.
  • Vector Math and service domain headers for Fortran (mkl_vml.f90, mkl_service.fi) may display compile errors when compiled with GNU Fortran 10.10. As short-term solution, -fallow-invalid-boz needs to be added to the compilation line. 
  • Iterative sparse solvers (ISS RCI) changed the behavior of <solver>_init and <solver>_check functionality to make the calls to the latter optional and lets them correct the parameter inconsistency if called.
  • In Sparse BLAS, there are three stages for the usage model: the create/inspection stage, the execution stage and the destruction stage. For Sparse BLAS with C OpenMP* offload, only the execution stage can be asynchronously done, provided any data dependencies are already respected. Known limitation: user must remove the "nowait" clause from the mkl_sparse_?_create_csr and mkl_sparse_destroy calls and add a "#pragma omp taskwait" before the call to mkl_sparse_destroy in the Sparse BLAS C OpenMP* offload async examples to make them safe.
  • For DFT Fortran OpenMP* offload, only rank-1 input arrays are supported. Multidimensional input data must be represented using a rank-1 array.
  • For DFT OpenMP* offload to GPU, user-defined strides with padding are not supported.

Intel® Iris® Xe MAX Graphics known issues and limitations

Unsupported Functionality:
  • Double precision functionality is not supported on this platform. 
  • In addition, the following single precision Vector Math functions are less accurate than their CPU counterparts for very large or very small arguments, including denormals: atan2pi, atanpi, cdfnorm, cdfnorminv, cosd, cosh, erfc, erfcinv, erfinv, expm1, frac, hypot, invcbrt, invsqrt, ln, powx, sind, sinh, and tand. 
  • Several Vector Math functions do not have HA accuracy versions, and have only low accuracy (LA) and extended precision (EP) versions: atan2pi, atanpi, cos, cosd, cospi, log2, log10, pow, powx, sin, sincos, sind, sinpi, tan.

Other Issues:

  • On Windows*, use of the OpenCL* backend is required for some BLAS and LAPACK functionality.​

 

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.