Overview
Overview
This publication, the
Developer Reference, was previously known as the
Reference Manual.
Intel® oneAPI
Math Kernel LibraryIntel® oneAPI
Math Kernel LibraryIntel® oneAPI
Math Kernel Library (oneMKL
) is optimized for performance on Intel processors.
oneMKL
also runs on non-Intel x86-compatible processors.oneMKL
provides limited input validation to minimize the performance overheads. It is your responsibility when using
oneMKL
to ensure that input data has the required format and does not contain invalid characters. These can cause unexpected behavior of the library. Examples of the inputs that may result in unexpected behavior:
- Not-a-number (NaN) and other special floating point values
- Large inputs may lead to accumulator overflow
As the
oneMKL
API accepts raw pointers, it is your application's responsibility to validate the buffer sizes before passing them to the library. The library requires subroutine and function parameters to be valid before being passed. While some
oneMKL
routines do limited checking of parameter errors, your application should check for NULL pointers, for example.
The
includes Fortran routines and functions optimized for Intel® processor-based computers running operating systems that support multiprocessing. In addition to the Fortran interface,
includes a C-language interface for the Discrete Fourier transform functions, as well as for the Vector Mathematics and Vector Statistics functions. For hardware and software requirements to use
, see Release Notes.
Intel® oneAPI
Math Kernel LibraryIntel® oneAPI Math Kernel Library
Intel® oneAPI Math Kernel Library
Intel® oneAPI Math Kernel Library
Functions calls at runtime for
libraries on the Microsoft Windows* operating system can utilize the function, in a
Intel® oneAPI Math Kernel Library
LoadLibrary()
, and related loading functions in static, dynamic, and single dynamic library linking models. These functions attempt to access the loader lock which when used within or at the same time as another
DllMain
function call, can lead to a deadlock. If possible, avoid making your calls to
Intel® oneAPI Math Kernel Library
DllMain
function or at the same time as other calls to
DllMain
even on separate threads. Refer to Microsoft documentation about
DllMain
and
Dynamic-Link Library Best Practices
for more details.
BLAS Routines
The BLAS routines and functions are divided into the following groups according to the operations they perform:
- BLAS Level 1 Routines perform operations of both addition and reduction on vectors of data. Typical operations include scaling and dot products.
- BLAS Level 2 Routines perform matrix-vector operations, such as matrix-vector multiplication, rank-1 and rank-2 matrix updates, and solution of triangular systems.
- BLAS Level 3 Routines perform matrix-matrix operations, such as matrix-matrix multiplication, rank-k update, and solution of triangular systems.
Starting from release 8.0,
also supports the Fortran 95 interface to the BLAS routines.
Intel® oneAPI Math Kernel Library
Starting from release 10.1, a number of
BLAS-like Extensions are added to enable the user to perform certain data manipulation, including matrix in-place and out-of-place transposition operations combined with simple matrix arithmetic operations.
Sparse BLAS Routines
The
Sparse BLAS Level 1 Routines and Functions and
Sparse BLAS Level 2 and Level 3 Routinesroutines and functions operate on sparse vectors and matrices. These routines perform vector operations similar to the BLAS Level 1, 2, and 3 routines. The Sparse BLAS routines take advantage of vector and matrix sparsity: they allow you to store only non-zero elements of vectors and matrices.
also supports Fortran 95 interface to Sparse BLAS routines.
Intel® oneAPI Math Kernel Library
Sparse QR
Sparse QRin
is a set of routines used to solve sparse matrices with real coefficients and general structure. All Sparse QR routines can be divided into three steps: reordering, factorization, and solving. Currently, only CSR format is supported for the input matrix, and Sparse QR operates on the matrix handle used in all SpBLAS IE routines. (For details on how to create a matrix handle, refer tomkl-sparse-create-csr.)
Intel® oneAPI Math Kernel Library
LAPACK Routines
The
fully supports the LAPACK 3.7 set of computational, driver, auxiliary and utility routines.
Intel® oneAPI
Math Kernel LibraryThe original versions of LAPACK from which that part of
was derived can be obtained fromhttp://www.netlib.org/lapack/index.html. The authors of LAPACK are E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen.
Intel® oneAPI Math Kernel Library
The LAPACK routines can be divided into the following groups according to the operations they perform:
- Routines for solving systems of linear equations, factoring and inverting matrices, and estimating condition numbers (seeLAPACK Routines: Linear Equations).
- Routines for solving least squares problems, eigenvalue and singular value problems, and Sylvester's equations (seeLAPACK Routines: Least Squares and Eigenvalue Problems).
Starting from release 8.0,
also supports the Fortran 95 interface to LAPACK computational and driver routines. This interface provides an opportunity for simplified calls of LAPACK routines with fewer required arguments.
Intel® oneAPI Math Kernel Library
Sparse Solver Routines
Direct sparse solver routines in
(see subroutines can solve both positive-definite and indefinite systems.
includes a solver based on the PARDISO* sparse solver, referred to as
PARDISO, as well as an alternative set of user callable direct sparse solver routines.
Intel® oneAPI Math Kernel Library
Sparse Solver Routines
) solve symmetric and symmetrically-structured sparse matrices with real or complex coefficients. For symmetric matrices, these
Intel® oneAPI Math Kernel Library
Intel® oneAPI Math Kernel Library
Intel® oneAPI Math Kernel Library
If you use the
PARDISO sparse solver, please cite:
Intel® oneAPI Math Kernel Library
O.Schenk and K.Gartner. Solving unsymmetric sparse systems of linear equations with PARDISO. J. of Future Generation Computer Systems, 20(3):475-487, 2004.
Intel® oneAPI Math Kernel Library
Sparse Solver Routines
) that uses Sparse BLAS level 2 and 3 routines and works with different sparse data formats.
Extended Eigensolver Routines
TheExtended Eigensolver RCI Routines is a set of high-performance numerical routines for solving standard () and generalized () eigenvalue problems, where
A
x
=
λ
x
A
x
=
λ
B
x
A
and
B
are symmetric or Hermitian. It yields all the eigenvalues and eigenvectors within a given search interval. It is based on the Feast algorithm, an innovative fast and stable numerical algorithm presented in
[Polizzi09], which deviates fundamentally from the traditional Krylov subspace iteration based techniques (Arnoldi and Lanczos algorithms
[Bai00]) or other Davidson-Jacobi techniques
[Sleijpen96]. The Feast algorithm is inspired by the density-matrix representation and contour integration technique in quantum mechanics.
It is free from orthogonalization procedures. Its main computational tasks consist of solving very few inner independent linear systems with multiple right-hand sides and one reduced eigenvalue problem orders of magnitude smaller than the original one. The Feast algorithm combines simplicity and efficiency and offers many important capabilities for achieving high performance, robustness, accuracy, and scalability on parallel architectures. This algorithm is expected to significantly augment numerical performance in large-scale modern applications.
Some of the characteristics of the Feast algorithm
[Polizzi09] are:
- Converges quickly in 2-3 iterations with very high accuracy
- Naturally captures all eigenvalue multiplicities
- No explicit orthogonalization procedure
- Can reuse the basis of pre-computed subspace as suitable initial guess for performing outer-refinement iterationsThis capability can also be used for solving a series of eigenvalue problems that are close one another.
- The number of internal iterations is independent of the size of the system and the number of eigenpairs in the search interval
- The inner linear systems can be solved either iteratively (even with modest relative residual error) or directly
VM Functions
The Vector Mathematics functions (see
Vector Mathematical Functions
) include a set of highly optimized implementations of certain computationally expensive core mathematical functions (power, trigonometric, exponential, hyperbolic, etc.) that operate on vectors of real and complex numbers.
Application programs that might significantly improve performance with VM include nonlinear programming software, integrals computation, and many others. VM provides interfaces both for Fortran and C languages.
Statistical Functions
Vector Statistics (VS) contains three sets of functions (see
Statistical Functions
) providing:
- Pseudorandom, quasi-random, and non-deterministic random number generator subroutines implementing basic continuous and discrete distributions. To provide best performance, the VS subroutines use calls to highly optimized Basic Random Number Generators (BRNGs) and a set of vector mathematical functions.
- A wide variety of convolution and correlation operations.
- Initial statistical analysis of raw single and double precision multi-dimensional datasets.
Fourier Transform Functions
The
multidimensional Fast Fourier Transform (FFT) functions with mixed radix support (see
Intel® oneAPI Math Kernel Library
Fourier Transform Functions
) provide uniformity of discrete Fourier transform computation and combine functionality with ease of use. Both Fortran and C interface specification are given. There is also a cluster version of FFT functions, which runs on distributed-memory architectures and is provided
only for Intel® 64 and Intel® Many Integrated Core architectures
.
The FFT functions provide fast computation via the FFT algorithms for arbitrary lengths. See
the
Developer Guide
for the specific radices supported.
Intel® oneAPI Math Kernel Library
Partial Differential Equations Support
Intel® oneAPI Math Kernel Library
Partial Differential Equations Support
). These tools are Trigonometric Transform interface routines and Poisson Solver.
The Trigonometric Transform routines may be helpful to users who implement their own solvers similar to the
Poisson Solver. The users can improve performance of their solvers by using fast sine, cosine, and staggered cosine transforms implemented in the Trigonometric Transform interface.
Intel® oneAPI Math Kernel Library
The Poisson Solver is designed for fast solving of simple Helmholtz, Poisson, and Laplace problems. The Trigonometric Transform interface, which underlies the solver, is based on the
FFT interface (refer to
Intel® oneAPI Math Kernel Library
Fourier Transform Functions
), optimized for Intel® processors.
Support Functions
The
support functions (see software and provide basic information on the library and library operation, such as the current library version, timing, setting and measuring of CPU frequency, error handling, and memory allocation.
Intel® oneAPI Math Kernel Library
Support Functions
) are used to support the operation of the
Intel® oneAPI Math Kernel Library
Starting from release 10.0, the
support functions provide additional threading control.
Intel® oneAPI Math Kernel Library
Starting from release 10.1,
selectively supports a routine supporting the progress routine feature. See
Intel® oneAPI Math Kernel Library
Progress Routine
feature to track progress of a lengthy computation and/or interrupt the computation using a callback function mechanism. The user application can define a function called
mkl_progress
that is regularly called from the
Intel® oneAPI Math Kernel Library
Progress Routine
in
Support Functions
for reference. Refer to a specific LAPACK or DSS/PARDISO function description to see whether the function supports this feature or not.
Optimization Notice
|
---|
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
|
This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX-512.