This section describes the interface to the shared-memory multiprocessing parallel direct sparse solver known as the Intel MKL PARDISO solver. The interface is Fortran, but it can be called from C or C++ programs by observing Fortran parameter passing and naming conventions used by the supported compilers and operating systems.
The Intel MKL PARDISO package is a high-performance, robust, memory efficient, and easy to use software package for solving large sparse linear systems of equations on shared memory multiprocessors. The solver uses a combination of left- and right-looking Level-3 BLAS supernode techniques [Schenk00-2]. To improve sequential and parallel sparse numerical factorization performance, the algorithms are based on a Level-3 BLAS update and pipelining parallelism is used with a combination of left- and right-looking supernode techniques [Schenk00, Schenk01, Schenk02, Schenk03]. The parallel pivoting methods allow complete supernode pivoting to compromise numerical stability and scalability during the factorization process. For sufficiently large problem sizes, numerical experiments demonstrate that the scalability of the parallel algorithm is nearly independent of the shared-memory multiprocessing architecture.
The following table lists the names of the Intel MKL PARDISO routines and describes their general use.
Initializes Intel MKL PARDISO with default parameters depending on the matrix type.
Calculates the solution of a set of sparse linear equations with single or multiple right-hand sides.
Calculates the solution of a set of sparse linear equations with single or multiple right-hand sides, 64-bit integer version.
Retrieves additional values from the Intel MKL PARDISO handle.
Sets additional values in the Intel MKL PARDISO handle.
Replaces routine which handles Intel MKL PARDISO pivots with user-defined routine.
Returns diagonal elements of initial and factorized matrix.
Store internal structures from pardiso to a file.
Restore pardiso internal structures from a file.
Delete files with pardiso internal structure data.
Store internal structures from pardiso_64 to a file.
Restore pardiso_64 internal structures from a file.
Delete files with pardiso_64 internal structure data.
The Intel MKL PARDISO solver supports a wide range of real and complex sparse matrix types (see the figure below).
The Intel MKL PARDISO solver performs four tasks:
- analysis and symbolic factorization
- numerical factorization
- forward and backward substitution including iterative refinement
- termination to release all internal solver memory.
You can find code examples that use Intel MKL PARDISO routines to solve systems of linear equations in the examples folder of the Intel MKL installation directory:
Fortran examples: examples/solverf/source
C examples: examples/solverc/source
Supported Matrix Types
The analysis steps performed by Intel MKL PARDISO depend on the structure of the input matrix A.
- Symmetric Matrices
The solver first computes a symmetric fill-in reducing permutation P based on either the minimum degree algorithm [Liu85] or the nested dissection algorithm from the METIS package [Karypis98] (both included with Intel MKL), followed by the parallel left-right looking numerical Cholesky factorization [Schenk00-2] of
PAPT = LLTfor symmetric positive-definite matrices, or PAPT = LDLT for symmetric indefinite matrices. The solver uses diagonal pivoting, or 1x1 and 2x2 Bunch-Kaufman pivoting for symmetric indefinite matrices. An approximation of X is found by forward and backward substitution and optional iterative refinement.
Whenever numerically acceptable 1x1 and 2x2 pivots cannot be found within the diagonal supernode block, the coefficient matrix is perturbed. One or two passes of iterative refinement may be required to correct the effect of the perturbations. This restricting notion of pivoting with iterative refinement is effective for highly indefinite symmetric systems. Furthermore, for a large set of matrices from different applications areas, this method is as accurate as a direct factorization method that uses complete sparse pivoting techniques [Schenk04].
Another method of improving the pivoting accuracy is to use symmetric weighted matching algorithms. These algorithms identify large entries in the coefficient matrix A that, if permuted close to the diagonal, permit the factorization process to identify more acceptable pivots and proceed with fewer pivot perturbations. These algorithms are based on maximum weighted matchings and improve the quality of the factor in a complementary way to the alternative of using more complete pivoting techniques.
The inertia is also computed for real symmetric indefinite matrices.
- Structurally Symmetric Matrices
The solver first computes a symmetric fill-in reducing permutation P followed by the parallel numerical factorization of PAPT = QLUT. The solver uses partial pivoting in the supernodes and an approximation of X is found by forward and backward substitution and optional iterative refinement.
- Nonsymmetric Matrices
The solver first computes a nonsymmetric permutation PMPS and scaling matrices Dr and Dc with the aim of placing large entries on the diagonal to enhance reliability of the numerical factorization process [Duff99]. In the next step the solver computes a fill-in reducing permutation P based on the matrix PMPSA + (PMPSA)T followed by the parallel numerical factorization
QLUR = PPMPSDrADcP
with supernode pivoting matrices Q and R. When the factorization algorithm reaches a point where it cannot factor the supernodes with this pivoting strategy, it uses a pivoting perturbation strategy similar to [Li99]. The magnitude of the potential pivot is tested against a constant threshold of
alpha = eps*||A2||inf,
epsis the machine precision,
A2 = P*PMPS*Dr*A*Dc*P, and
||A2||infis the infinity norm of
A. Any tiny pivots encountered during elimination are set to the sign
(lII)*eps*||A2||inf, which trades off some numerical stability for the ability to keep pivots from getting too small. Although many failures could render the factorization well-defined but essentially useless, in practice the diagonal elements are rarely modified for a large class of matrices. The result of this pivoting approach is that the factorization is, in general, not exact and iterative refinement may be needed.
Sparse Data Storage
Sparse data storage in Intel MKL PARDISO follows the 3-array variation of the compressed sparse row (CSR3) format described in Sparse Matrix Storage Format with the Intel MKL PARDISO parameters ja standing for columns, ia for rowIndex, and a for values. The algorithms in Intel MKL PARDISO require column indices ja to be in increasing order per row and that the diagonal element in each row be present for any structurally symmetric matrix. For symmetric or nonsymmetric matrices the diagonal elements which are equal to zero are not necessary.
Intel MKL PARDISO column indices ja must be in increasing order per row. You can validate the sparse matrix structure with the matrix checker (
While the presence of zero diagonal elements for symmetric matrices is not required, you should explicitly set zero diagonal elements for symmetric matrices. Otherwise, Intel MKL PARDISO creates internal copies of arrays ia, ja, and a full of diagonal elements, which require additional memory and computational time. However, the memory and time required the diagonal elements in internal arrays is usually not significant compared to the memory and the time required to factor and solve the matrix.
Storage of Matrices
By default, Intel MKL PARDISO stores matrices in RAM. However, you can specify that Intel MKL PARDISO store matrices on disk by setting
iparm(60). This is referred to as in-core (IC) and out-of-core (OOC), respectively.
OOC parameters can be set in a configuration file. You can set the path to this file and its name using environmental variables MKL_PARDISO_OOC_CFG_PATH and MKL_PARDISO_OOC_CFG_FILE_NAME.
These variables specify the path and filename as follows:
<MKL_PARDISO_OOC_CFG_PATH>/<MKL_PARDISO_OOC_CFG_FILE_NAME> for Linux* OS and OS X*, and
<MKL_PARDISO_OOC_CFG_PATH>\<MKL_PARDISO_OOC_CFG_FILE_NAME> for Windows* OS.
By default, the name of the file is pardiso_ooc.cfg and it is placed in the current directory.
All temporary data files can be deleted or stored when the calculations are completed in accordance with the value of the environmental variable MKL_PARDISO_OOC_KEEP_FILE. If it is not set or if it is set to 1, then all files are deleted. If it is set to 0, then all files are stored.
By default, the OOC version of Intel MKL PARDISO uses the current directory for storing data, and all work arrays associated with the matrix factors are stored in files named ooc_temp with different extensions. These default values can be changed by using the environmental variable MKL_PARDISO_OOC_PATH.
To set the environmental variables MKL_PARDISO_OOC_MAX_CORE_SIZE, MKL_PARDISO_OOC_MAX_SWAP_SIZE, MKL_PARDISO_OOC_KEEP_FILE, and MKL_PARDISO_OOC_PATH, create the configuration file with the following lines:
MKL_PARDISO_OOC_PATH = <path>\ooc_file MKL_PARDISO_OOC_MAX_CORE_SIZE = N MKL_PARDISO_OOC_MAX_SWAP_SIZE = K MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)
<path> is the directory for storing data,
ooc_file is the file name without any extension, N is the maximum size of RAM in megabytes available for Intel MKL PARDISO (default value is 2000 MB), K is the maximum swap size in megabytes available for Intel MKL PARDISO (default value is 0 MB). Do not set N+K greater than the size of the RAM plus the size of the swap. Be sure to allow enough free memory for the operating system and any other processes which are necessary.
The maximum length of the path lines in the configuration files is 1000 characters.
Alternatively the environment variables can be set via command line.
For Linux* OS and OS X*:
export MKL_PARDISO_OOC_PATH = <path>/ooc_file export MKL_PARDISO_OOC_MAX_CORE_SIZE = N export MKL_PARDISO_OOC_MAX_SWAP_SIZE = K export MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)
For Windows* OS:
set MKL_PARDISO_OOC_PATH = <path>\ooc_file set MKL_PARDISO_OOC_MAX_CORE_SIZE = N set MKL_PARDISO_OOC_MAX_SWAP_SIZE = K set MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)
The values specified in a command line have higher priorities: if a variable is changed in the configuration file and in the command line, OOC version of Intel MKL PARDISO uses only the value defined in the command line.
Direct-Iterative Preconditioning for Nonsymmetric Linear Systems
The solver uses a combination of direct and iterative methods [Sonn89] to accelerate the linear solution process for transient simulation. Most applications of sparse solvers require solutions of systems with gradually changing values of the nonzero coefficient matrix, but with an identical sparsity pattern. In these applications, the analysis phase of the solvers has to be performed only once and the numerical factorizations are the important time-consuming steps during the simulation. Intel MKL PARDISO uses a numerical factorization and applies the factors in a preconditioned Krylow-Subspace iteration. If the iteration does not converge, the solver automatically switches back to the numerical factorization. This method can be applied to nonsymmetric matrices in Intel MKL PARDISO. You can select the method using the iparm(4) input parameter. The iparm(20) parameter returns the error status after running Intel MKL PARDISO.
Single and Double Precision Computations
Intel MKL PARDISO solves tasks using single or double precision. Each precision has its benefits and drawbacks. Double precision variables have more digits to store value, so the solver uses more memory for keeping data. But this mode solves matrices with better accuracy, which is especially important for input matrices with large condition numbers.
Single precision variables have fewer digits to store values, so the solver uses less memory than in the double precision mode. Additionally this mode usually takes less time. But as computations are made less precisely, only some systems of equations can be solved accurately enough using single precision.
Separate Forward and Backward Substitution
The solver execution step (see parameter
phase = 33 below) can be divided into two or three separate substitutions: forward, backward, and possible diagonal. This separation can be explained by the examples of solving systems with different matrix types.
A real symmetric positive definite matrix A (
mtype = 2) is factored by Intel MKL PARDISO as
A = L*LT . In this case the solution of the system
A*x=b can be found as sequence of substitutions:
L*y=b (forward substitution,
phase =331) and
LT*x=y (backward substitution,
A real nonsymmetric matrix A (
mtype = 11) is factored by Intel MKL PARDISO as
A = L*U . In this case the solution of the system
A*x=b can be found by the following sequence:
L*y=b (forward substitution,
phase =331) and
U*x=y (backward substitution,
Solving a system with a real symmetric indefinite matrix A (
mtype = -2) is slightly different from the cases above. Intel MKL PARDISO factors this matrix as
A=LDLT, and the solution of the system
A*x=b can be calculated as the following sequence of substitutions:
L*y=b (forward substitution,
D*v=y (diagonal substitution,
phase =332), and finally
LT*x=v (backward substitution,
phase =333). Diagonal substitution makes sense only for symmetric indefinite matrices (
mtype = -2, -4, 6). For matrices of other types a solution can be found as described in the first two examples.
The number of refinement steps (
iparm(8)) must be set to zero if a solution is calculated with separate substitutions (
phase = 331, 332, 333), otherwise Intel MKL PARDISO produces the wrong result.
Different pivoting (
iparm(21)) produces different
LDLT factorization. Therefore results of forward, diagonal and backward substitutions with diagonal pivoting can differ from results of the same steps with Bunch-Kaufman pivoting. Of course, the final results of sequential execution of forward, diagonal and backward substitution are equal to the results of the full solving step (
phase=33) regardless of the pivoting used.