Developer Reference

  • 0.9
  • 09/09/2020
  • Public Content
Contents

oneMKL
PARDISO - Parallel Direct Sparse Solver Interface

This section describes the interface to the shared-memory multiprocessing parallel direct sparse solver known as the
Intel® oneAPI Math Kernel Library
PARDISO solver.
The
Intel® oneAPI Math Kernel Library
PARDISO package is a high-performance, robust, memory efficient, and easy to use software package for solving large sparse linear systems of equations on shared memory multiprocessors. The solver uses a combination of left- and right-looking Level-3 BLAS supernode techniques [Schenk00-2]. To improve sequential and parallel sparse numerical factorization performance, the algorithms are based on a Level-3 BLAS update and pipelining parallelism is used with a combination of left- and right-looking supernode techniques [Schenk00, Schenk01, Schenk02, Schenk03]. The parallel pivoting methods allow complete supernode pivoting to compromise numerical stability and scalability during the factorization process. For sufficiently large problem sizes, numerical experiments demonstrate that the scalability of the parallel algorithm is nearly independent of the shared-memory multiprocessing architecture.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX-512.
The following table lists the names of the
Intel® oneAPI Math Kernel Library
PARDISO routines and describes their general use.
oneMKL
PARDISO Routines
Routine
Description
Initializes
Intel® oneAPI Math Kernel Library
PARDISO with default parameters depending on the matrix type.
Calculates the solution of a set of sparse linear equations with single or multiple right-hand sides.
Calculates the solution of a set of sparse linear equations with single or multiple right-hand sides, 64-bit integer version.
Replaces routine which handles
Intel® oneAPI Math Kernel Library
PARDISO pivots with user-defined routine.
Returns diagonal elements of initial and factorized matrix.
Places pointers dedicated for sparse representation of requested matrix into MKL PARDISO.
Store internal structures from
pardiso
to a file.
Restore
pardiso
internal structures from a file.
Delete files with
pardiso
internal structure data.
Store internal structures from
pardiso_64
to a file.
Restore
pardiso_64
internal structures from a file.
Delete files with
pardiso_64
internal structure data.
The
Intel® oneAPI Math Kernel Library
PARDISO solver supports a wide range of real and complex sparse matrix types (seethe figure below).
Sparse Matrices That Can Be Solved with the Intel MKL PARDISO Solver
Sparse Matrices That Can be Solved With PARDISO
The
Intel® oneAPI Math Kernel Library
PARDISO solver performs four tasks:
  • analysis and symbolic factorization
  • numerical factorization
  • forward and backward substitution including iterative refinement
  • termination to release all internal solver memory.
To find code examples that use
Intel® oneAPI Math Kernel Library
PARDISO routines to solve systems of linear equations, unzip the
C
archive file in the
examples
folder of the
Intel® oneAPI Math Kernel Library
installation directory. Code examples will be in the
examples/solverc/source
folder.

Supported Matrix Types

The analysis steps performed by
Intel® oneAPI Math Kernel Library
PARDISO depend on the structure of the input matrix
A
.
Symmetric Matrices
The solver first computes a symmetric fill-in reducing permutation
P
based on either the minimum degree algorithm [Liu85] or the nested dissection algorithm from the METIS package [Karypis98] (both included with
Intel® oneAPI Math Kernel Library
), followed by the parallel left-right looking numerical Cholesky factorization [Schenk00-2] of
PAP
T
=
LL
T
for symmetric positive-definite matrices, or
PAP
T
=
LDL
T
for symmetric indefinite matrices. The solver uses diagonal pivoting, or 1x1 and 2x2 Bunch-Kaufman pivoting for symmetric indefinite matrices. An approximation of
X
is found by forward and backward substitution and optional iterative refinement.
Whenever numerically acceptable 1x1 and 2x2 pivots cannot be found within the diagonal supernode block, the coefficient matrix is perturbed. One or two passes of iterative refinement may be required to correct the effect of the perturbations. This restricting notion of pivoting with iterative refinement is effective for highly indefinite symmetric systems. Furthermore, for a large set of matrices from different applications areas, this method is as accurate as a direct factorization method that uses complete sparse pivoting techniques [Schenk04].
Another method of improving the pivoting accuracy is to use symmetric weighted matching algorithms. These algorithms identify large entries in the coefficient matrix
A
that, if permuted close to the diagonal, permit the factorization process to identify more acceptable pivots and proceed with fewer pivot perturbations. These algorithms are based on maximum weighted matchings and improve the quality of the factor in a complementary way to the alternative of using more complete pivoting techniques.
The inertia is also computed for real symmetric indefinite matrices.
Structurally Symmetric Matrices
The solver first computes a symmetric fill-in reducing permutation
P
followed by the parallel numerical factorization of
PAP
T
=
QLU
T
. The solver uses partial pivoting in the supernodes and an approximation of
X
is found by forward and backward substitution and optional iterative refinement.
Nonsymmetric Matrices
The solver first computes a nonsymmetric permutation
P
MPS
and scaling matrices
D
r
and
D
c
with the aim of placing large entries on the diagonal to enhance reliability of the numerical factorization process [Duff99]. In the next step the solver computes a fill-in reducing permutation
P
based on the matrix
P
MPS
A
+ (
P
MPS
A
)
T
followed by the parallel numerical factorization
Q
L
U
R
=
P
P
MPS
D
r
A
D
c
P
with supernode pivoting matrices
Q
and
R
. When the factorization algorithm reaches a point where it cannot factor the supernodes with this pivoting strategy, it uses a pivoting perturbation strategy similar to [Li99]. The magnitude of the potential pivot is tested against a constant threshold of
alpha = eps*||
A2
||
inf
,
where
eps
is the machine precision,
A2
=
P*P
MPS
*
D
r
*
A
*
D
c
*
P
, and
||
A2
||
inf
is the infinity norm of
A
. Any tiny pivots encountered during elimination are set to the sign
(
l
I
I
)*eps*||
A2
||
inf
, which trades off some numerical stability for the ability to keep pivots from getting too small. Although many failures could render the factorization well-defined but essentially useless, in practice the diagonal elements are rarely modified for a large class of matrices. The result of this pivoting approach is that the factorization is, in general, not exact and iterative refinement may be needed.

Sparse Data Storage

Intel® oneAPI Math Kernel Library
PARDISO stores sparse data in several formats:
  • CSR3: The 3-array variation of the compressed sparse row format described in Three Array Variation of CSR Format.
  • BSR3: The three-array variation of the block compressed sparse row format described in Three Array Variation of BSR Format. Use
    iparm
    [36]
    to specify the block size.
  • VBSR: Variable BSR format.
    Intel® oneAPI Math Kernel Library
    PARDISO analyzes the matrix provided in CSR3 format and converts it into an internal structure which can improve performance for matrices with a block structure. Use
    iparm
    [36]
    = -
    t
    (0 <
    t
    100) to specify use of internal VBSR format and to set the degree of similarity required to combine elements of the matrix. For example, if you set
    iparm
    [36]
    = -80
    , two rows of the input matrix are combined when their non-zero patterns are 80% or more similar.
    Intel® oneAPI Math Kernel Library
    supports only the VBSR format for real and symmetric positive definite or indefinite matrices (
    mtype
    = 2 or
    mtype
    = -2).
    Intel® oneAPI Math Kernel Library
    supports these features for all matrix types as long as
    iparm
    [23]
    =1:
    • iparm
      [30]
      > 0
      : Partial solution
    • iparm
      [35]
      > 0
      : Schur complement
    • iparm
      [59]
      > 0
      : OOC
      Intel® oneAPI Math Kernel Library
      PARDISO
For all storage formats, the
Intel® oneAPI Math Kernel Library
PARDISO parameter
ja
is used for the
columns
array,
ia
is used for
rowIndex
, and
a
is used for
values
. The algorithms in
Intel® oneAPI Math Kernel Library
PARDISO require column indices
ja
to be in increasing order per row and that the diagonal element in each row be present for any structurally symmetric matrix. For symmetric or nonsymmetric matrices the diagonal elements which are equal to zero are not necessary.
Intel® oneAPI Math Kernel Library
PARDISO column indices
ja
must be in increasing order per row. You can validate the sparse matrix structure with the matrix checker (
iparm
[26]
)
While the presence of zero diagonal elements for symmetric matrices is not required, you should explicitly set zero diagonal elements for symmetric matrices. Otherwise,
Intel® oneAPI Math Kernel Library
PARDISO creates internal copies of arrays
ia
,
ja
, and
a
full of diagonal elements, which require additional memory and computational time. However, the memory and time required the diagonal elements in internal arrays is usually not significant compared to the memory and the time required to factor and solve the matrix.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX-512.

Storage of Matrices

By default,
Intel® oneAPI Math Kernel Library
PARDISO stores data in RAM. This is referred to as In-Core (IC) mode. However, you can specify that
Intel® oneAPI Math Kernel Library
PARDISO store matrices on disk by setting
iparm
[59]
. This mode is called the Out-of-Core (OOC) mode.
You can set the following parameters for the OOC mode.
Parameter/Environment Variable Name
Description
MKL_PARDISO_OOC_PATH
Directory for storing data created in the OOC mode.
MKL_PARDISO_OOC_FILE_NAME
Full file name (incl. path) which will be used for the OOC files
MKL_PARDISO_OOC_MAX_CORE_SIZE
Maximum size of RAM (in megabytes) available for
Intel® oneAPI Math Kernel Library
PARDISO
MKL_PARDISO_OOC_MAX_SWAP_SIZE
Maximum swap size (in megabytes) available for
Intel® oneAPI Math Kernel Library
PARDISO
MKL_PARDISO_OOC_KEEP_FILE
A flag which determines whether temporary data files will be deleted or stored
By default, the current working directory is used in the OOC mode as a directory path for storing data. All work arrays will be stored in files named
ooc_temp
with different extensions. When
MKL_PARDISO_OOC_FILE_NAME
is not set and
MKL_PARDISO_OOC_PATH
is set, the names for the created files will contain
<path>/mkl_pardiso
or
<path>\mkl_pardiso
depending on the OS. Setting
MKL_PARDISO_OOC_FILE_NAME=<filename>
will override the path which could have been set in
MKL_PARDISO_OOC_PATH
. In this case
<filename>
will be used for naming the OOC files.
By default,
MKL_PARDISO_OOC_MAX_CORE_SIZE
is 2000 (MB) and
MKL_PARDISO_OOC_MAX_SWAP_SIZE
is 0.
Do not set the sum of
MKL_PARDISO_OOC_MAX_CORE_SIZE
and
MKL_PARDISO_OOC_MAX_SWAP_SIZE
greater than the size of the RAM plus the size of the swap memory. Be sure to allow enough free memory for the operating system and any other processes which need to be running.
By default, all temporary data files will be deleted. For keeping them it is required to set
MKL_PARDISO_OOC_KEEP_FILE
to 0.
OOC parameters can be set in a configuration file. You can set the path to this file and its name using environmental variables
MKL_PARDISO_OOC_CFG_PATH
and
MKL_PARDISO_OOC_CFG_FILE_NAME
.
For setting parameters of OOC mode either environment variables or a configuration file can be used. When the last option is chosen, by default the name of the file is
pardiso_ooc.cfg
and it should be placed in the working directory. If needed, the user can set the path to the configuration file using environmental variables
MKL_PARDISO_OOC_CFG_PATH
and
MKL_PARDISO_OOC_CFG_FILE_NAME
. These variables specify the path and filename as follows:
  • Linux* OS and OS X*:
    <MKL_PARDISO_OOC_CFG_PATH>/ <MKL_PARDISO_OOC_CFG_FILE_NAME>
  • Windows* OS:
    <MKL_PARDISO_OOC_CFG_PATH>\<MKL_PARDISO_OOC_CFG_FILE_NAME>
An example of the configuration file:
MKL_PARDISO_OOC_PATH = <path> MKL_PARDISO_OOC_MAX_CORE_SIZE = N MKL_PARDISO_OOC_MAX_SWAP_SIZE = K MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)
The maximum length of the path lines in the configuration files is 1000 characters.
Alternatively, the OOC parameters can be set as environment variables via command line.
For Linux* OS and OS X*:
export MKL_PARDISO_OOC_PATH = <path> export MKL_PARDISO_OOC_MAX_CORE_SIZE = N export MKL_PARDISO_OOC_MAX_SWAP_SIZE = K export MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)
For Windows* OS:
set MKL_PARDISO_OOC_PATH = <path> set MKL_PARDISO_OOC_MAX_CORE_SIZE = N set MKL_PARDISO_OOC_MAX_SWAP_SIZE = K set MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)
where
<path>
should follow the OS naming convention.

Direct-Iterative Preconditioning for Nonsymmetric Linear Systems

The solver uses a combination of direct and iterative methods [Sonn89] to accelerate the linear solution process for transient simulation. Most applications of sparse solvers require solutions of systems with gradually changing values of the nonzero coefficient matrix, but with an identical sparsity pattern. In these applications, the analysis phase of the solvers has to be performed only once and the numerical factorizations are the important time-consuming steps during the simulation.
Intel® oneAPI Math Kernel Library
PARDISO uses a numerical factorization and applies the factors in a preconditioned Krylov Subspace iteration. If the iteration does not converge, the solver automatically switches back to the numerical factorization. This method can be applied to nonsymmetric matrices in
Intel® oneAPI Math Kernel Library
PARDISO. You can select the method using the
iparm
[3]
input parameter. The
[19]
parameter returns the error status after running
Intel® oneAPI Math Kernel Library
PARDISO.

Single and Double Precision Computations

Intel® oneAPI Math Kernel Library
PARDISO solves tasks using single or double precision. Each precision has its benefits and drawbacks. Double precision variables have m